IoT & Connectivity Tips

Thundering Herd Scenarios in ThingWorx Written by Jim Klink, Edited by Tori Firewind Introduction The thundering herd topic is quite vast, but it can be broken down into two main categories: the “data flood” and the “reconnect storm”. One category involves what happens to the business login (the “data flood” scenario) and affects both Factory and Connected Products use cases. The other category involves bringing many, many devices back online in a short time (the “reconnect storm” scenario), which largely influences Connected Products scenarios. Citation: https://gfp.sd.gov/buffalo-roundup/ Think of Connected Products as a thundering stampede of many small buffalo, which then makes a Factory thundering herd scenario a stampede of a couple massive brontosaurus, much fewer in number, but still with lots of persisted data to send back in. This article focuses in on how to manage the “reconnect storm” scenario, by delaying the return of individual buffalo to reduce the intensity of the stampede. Find here the necessary insights on how to configure your ThingWorx edge applications to minimize the effect of a server down scenario. The C-SDK will be used for examples, but the general principles will apply to any of the ThingWorx edge options (EMS, .Net SDK, Java SDK). This article also references the ExampleAgent application which is built using the C-SDK. The ExampleAgent is available for download as an attachment to this post. It offers an easily configurable edge solution for Windows and Linux that can be used for the following purposes: Foundation for rapid development of a robust custom edge application based on the ThingWorx C-SDK for use by customers and partners. Full featured, well documented, ‘C’ source code example of developing an application using the ThingWorx C-SDK. A “local” issue is one which affects a single agent, a loss of connectivity due to hardware malfunction or local network issues. Local issues are quite common in the IoT world, and recovery usually isn’t too much of a challenge. A “global” issue occurs when many agents disconnect simultaneously, usually because there is an issue with the ThingWorx server itself (though the Load Balancer, Connection Server, or web hosting software could also be the source). Perhaps it is a scheduled software update, perhaps it is unexpected downtime due to issues, but either way, it’s important to consider how the fleet of agents will respond if ThingWorx suddenly becomes unavailable. There are two broad issues to consider in a situation like this. One is maintaining the agent’s data so that it can be sent when the connection becomes available again. This can be done in the C-SDK using an offline file storage system, which includes properties, events, and services. Offline storage is configured in the twConfig.h file in the C SDK. The second issue the number of Agents seeking to reconnect to the server in a short period of time when the server is available. Of course, if revenue is based on uptime, perhaps persisting data is less critical and can be lost, making things simple. However, in most cases, this data will need to be stored on the edge device until reconnect. Then, once the server comes back up, suddenly all of this data comes streaming in from all of the many edge devices simultaneously. This flood of both data and reconnection of a multitude of agents can create what is called a “thundering herd” scenario, in which ThingWorx can become backlogged with data processing requests, data can be lost if the queues are overwhelmed, or worst-case, the Foundation server can become unresponsive once again. This is when outages become costly and drag on longer than necessary. Several factors can lead to a thundering herd scenario, including the number of agents in the fleet, the amount of stored data per agent, the amount of data ordinarily sent by these devices, which is sent side-by-side with the stored data upon reconnection, and how much processing occurs once all of this data is received on the Foundation server. The easiest way to mitigate a potential thundering herd scenario, and this is considered a ThingWorx best practice as well, is to randomize the reconnection of devices. Each agent can be configured to delay itself by a random amount of time before attempting to reconnect after a loss of connectivity. This random delay then distributes the number of assets connecting at a time over a longer period, thus minimizing the impact of the reconnections on ThingWorx. There are several configuration settings that help in this regard. Configuring the Herd (C-SDK) The C-SDK is great at managing agent connectivity, having a lot of options for fine-tuning the connections. The web-socket connection is managed by the SDK layer of the edge device (which also manages the retry process). To review the source code for how connections are made, see the C-SDK file found here: src\api\twApi.c, specifically the function called twApi_Connect(). The ExampleAgent uses custom configuration files to manage this process from the application layer, a more robust and complete solution. Detailed here are the configuration options in the ExampleAgent attached to this post, most of which can be found in its ws_connection.json configuration file: connect_timeout is used throughout the C-SDK as the time to wait for a web-socket connection to be established (i.e. the ‘timeout’ value). This is the maximum delay for the socket to be established or to send and receive data. If it is established sooner, then a success code is returned. If a connection is not established in the configured timeout period, then an error is returned. Setting this value to 10 seconds is reasonable, for reference. connect_retries is the number of times the SDK will attempt to establish a connection before the twApi_Connect() function returns an error. Setting this to -1 will trigger the SDK to stay in the loop infinitely until a connection is established. connect_retry_interval is the delay between connection retries. max_connect_delay is used as a delay before even entering the loop, that which uses the connect_retries and connect_retry_interval parameters to establish the connection. The SDK function twAddConnectionDelay() is called, which delays by a random amount of time between 0 seconds and the value given by this parameter. This random delay is only used once per call to twApi_Connect(). This is therefore the parameter most critical to preventing thundering herd scenarios (as discussed above). Configuring the SDK agents to reconnect in this way is critical, but there are also some drawbacks, namely that while the twApi_Connect() function is running, there is no clean way of shutting the agent down. Likewise, the agent only does ONE randomized delay per call of the twApi_Connect() function, meaning that if reconnection cannot occur immediately, it’s still possible for many agents to try to reconnect at once. Consider this when determining what values to assign to these parameters. ExampleAgent Design The ExampleAgent provided here is a fully implemented, configurable application, like the EMS in terms of functionality, but containing only simulated data. The data capture component is missing here and has to be custom developed. Attached alongside this source code is extensive documentation that explains how to get the application set up and configured. This isn’t meant to be used directly in a production environment. Please note that the ExampleAgent is provided as-is; it is not an officially released product by PTC. This disclaimer includes the ExampleAgent source code, build process, documentation, deliverables as well as any ExampleAgent modifications to the official releases of the C-SDK or the SCM extension product. Full and sole responsibility for the use, deployment, reliability, and accuracy of any ExampleAgent related code, documentation, etc. falls to the user, and any use of the ExampleAgent is an implicit agreement with this disclaimer. The ExampleAgent was developed by PTC sales and services to help in the Edge application development process. For assistance, support, or additional development, an authorized statement of work is needed. Please Note: PTC support is not aware of the existence of the ExampleAgent and cannot provide assistance. Because of the small downside to configuring the twApi_Connect() function directly as discussed above, there is alternative approach given here as well. The ExampleAgent module ConnectionMgr.c controls the calling of the twApi_Connect() on a dedicated connection thread. The ConnectionMgrThreadFunction() contains the source code necessary to understanding this process. The ConnectionMgr.c workflow and source code visualization via Microsoft Visual Studio are in the diagrams below. The ExampleAgent defines its own randomized delay to mitigate the thundering herd scenario while still deploying an edge system that responds to shut down requests cleanly. In this case, the randomized delay is configured by the parameter reconnect_random_delay_seconds in the agent_config.json file. Since the ConnectionMgrThreadFunction() controls the calling of twApi_Connect(), the ConnectionMgrThreadFunction() will delay the randomized value EVERY time before calling this reconnect function. A separate thread is created to call the reconnect function so that there are still resources available for data processing and to check for shutdown signals and other conditions. Recommended Values These recommendations are based around managing the reconnection process from the application layer. These may be different if the C-SDK is configured directly, but creating application layer management is recommended and provided in the ExampleAgent attached. The ExampleAgent is configured by default to simplify the SDK layer’s involvement. These configuration options tell the SDK layer to try to connect just once, after just 1 second: There is no official recommendation for the above values due to the fact that every use case is different and will require different fine tuning to work well. Then this setting here handles the retry process from within the application layer of the ExampleAgent: Conclusion To reduce the chances of a thundering herd scenario, configure the fleet to reconnect after differing random delays. The larger the random delay times, the longer it takes for the fleet to come back online and fleet data to be received. While more complex ThingWorx deployment architectures (such as container-based deployments like Kubernetes or Thingworx High Availability (HA) clusters) can also help to address the increased peak load during a thundering herd event, randomized reconnect delays can still be an effective tool.

Aug 25, 2021

ThingWorx Docker Overview and Pitfalls to Avoid by Tori Firewind of the IoT EDC Containers are isolated and can run side-by-side on the same machine, but they share the host OS, making them more efficient in terms of memory usage and scalability. Docker is a great tool for deploying ThingWorx instances because everything is pre-packaged within the Docker image and can be stored in a repository ready for deployment at any time with little configuration required. By using a different container for every component of an application, conflicting dependencies can be avoided. Containers also facilitate the dev ops process, providing consistent application deployments which can be set up, taken down, and tested automatically using scripts. Using containers is advantageous for many reasons: simplified configuration, easier dev ops management, continuous integration and deployment, cost savings, decreased delivery time for new application versions, and many versions of an application running side-by-side without any wasted resources setting them up or tearing them down. The ThingWorx Help Center is a great resource for setting up Docker and obtaining the ThingWorx Docker files from the PTC Software Downloads website. The files provided by PTC handle the creation of the image entirely, simplifying the process immensely. All one has to do is place the ThingWorx version and all of the required dependencies in the staging folder, configure the YML file, and run the build scripts. The Help Center has all of the detailed information required, but there are a few things worth noting here about the configuration process. For one thing, the platform-settings.json file is generated based on the options given in the YML file, so configuration changes made within this configuration file will not persist if the same options aren’t given in the YML file. If using Docker Desktop to run an image on a Windows machine, then the configuration options must be given in an ENV file that can be referenced from the command used to start the image. The names of the configuration parameters differ from the platform-settings.json file in ways that are not always obvious, and a full list can be found here. For example, if extension imports need to be enabled on a ThingWorx instance running in Docker, then the EXTPKG_IMPORT_POLICY_ENABLED option must be added to the environment section of the YML file like this: environment: - "CATALINA_OPTS=-Xms2g -Xmx4g" # NOTE: TWX_DATABASE_USERNAME and TWX_DATABASE_PASSWORD for H2 platform must # be set to create the initial database, or connect to a previous instance. - "TWX_DATABASE_USERNAME=dbadmin" - "TWX_DATABASE_PASSWORD=dbadmin" - "EXTPKG_IMPORT_POLICY_ENABLED=true" - "EXTPKG_IMPORT_POLICY_ALLOW_JARRES=true" - "EXTPKG_IMPORT_POLICY_ALLOW_JSRES=true" - "EXTPKG_IMPORT_POLICY_ALLOW_CSSRES=true" - "EXTPKG_IMPORT_POLICY_ALLOW_JSONRES=true" - "EXTPKG_IMPORT_POLICY_ALLOW_WEBAPPRES=true" - "EXTPKG_IMPORT_POLICY_ALLOW_ENTITIES=true" - "EXTPKG_IMPORT_POLICY_ALLOW_EXTENTITIES=true" - "EXTPKG_IMPORT_POLICY_HA_COMPATIBILITY_LEVEL=WARN" - "DOCKER_DEBUG=true" - "THINGWORX_INITIAL_ADMIN_PASSWORD=Pleasechangemenow" Note that if the container is started and then stopped in order for changes to the YML file to be made, the license file will need to be renamed from "successful_license_capability_response.bin" to "license_capability_response.bin" so that the Foundation server can rename it. Failing to rename this file may cause an error to appear in the Application Log, and the server to act as if no license was ever installed: "Error reading license feature info for twx_realtime_data_sub". In Docker Desktop on a Windows machine, create a file called whatever.env and list the parameters as shown here: Then, reference this environment file when bringing up the machine using the following command in Powershell: docker run -d --env-file h2.env -p 8080:8080 -v ${pwd}/ThingworxPlatform:/ThingworxPlatform -v ${pwd}/ThingworxStorage:/ThingworxStorage -it <image_id> Notice in this command that the volumes for the ThingworxPlatform and ThingworxStorage folders are specified with the “-v” options. When building the Docker image in Linux, these are given in the YML file under the volumes section like this (only change the path to local mount on the left side of the colon, as the container mount on the right side will never change): volumes: - ./ThingworxPlatform:/ThingworxPlatform - ./ThingworxStorage:/ThingworxStorage - ./tomcat-logs:/opt/apache-tomcat/logs Specifying the volumes this way allows for ThingWorx logs and configuration files to be accessed directly, a crucial requirement to debugging any issues within the Foundation instance. These volumes must be mapped to existing folders (which have write permissions of course) so that if the instance won’t come up or there are any other issues which require help from Tech Support, the logs can be copied out and shared. Otherwise, the Docker container is like a black box which obscures what is really going on. There may not be any errors in the Docker logs; the container may just quit without error with no sign of why it won’t stay up. Checking the ThingWorx and Tomcat logs is necessary to debugging, so be sure to map these volumes correctly. Once these volumes are mapped and ThingWorx is successfully making use of them, adding a license file to the Docker instance is simple. Use the output in the ThingworxPlatform folder to obtain the device ID, grab a valid license file, and put it right back into that ThingworxPlatform folder, exactly the same way as on a regular instance of ThingWorx. However, if the Docker image is being used for a dev ops process, a license may not be necessary. The ThingWorx instance will work and allow development for a time before the trial license expires, which normally will be enough time for developers to make their changes, push those changes to a repository, and tear the container down. Another thing worth noting about ThingWorx Docker image creation is that the version of Java supplied in the staging folder must match the compatibility requirements for each version of ThingWorx. This is the version of Java used by the container to run the Foundation server. In versions of ThingWorx 9.2+, this means using the Amazon Corretto version of Java. The image absolutely will not start ThingWorx successfully if older versions of Java are used, even if the scripts do successfully build the image. Also note that in the newer versions of ThingWorx Docker, the ThingWorx Foundation version within the build.env file is used throughout the Docker image creation process. Therefore, while the archive name can be hard-coded to whatever is desired, the version should be left as is, including any additional specifications beyond just the version number. For example, the name of the archive can be given as Thingworx-Platform-H2-9.2.0.zip (a prettier version of the archive name than is used by default), but the PLATFORM_VERSION should still be set to 9.2.0-b30 (which should be how it appears within the build.env file upon download of the ThingWorx Docker files). Paying attention to every note in the Help Center is critically important to using ThingWorx Docker, as the process is extensive and can become very complicated depending on how the image will be used. However, as long as the volumes are specified and the log files accessible, debugging any issues while bringing up a Docker-contained ThingWorx instance is fairly straightforward. Credits: Images borrowed from ThingWorx Docker Containerization Tech Talk by Adrian Petrescu

Jul 30, 2021

With ThingWorx, we can already use univariate anomaly alerts (on a single sensor value). However, in many situations, the readings from an individual sensor may not tell you much about the overall issue and a multivariate anomaly detector can be more useful. This post is intended to provide an overview of the Azure Anomaly Detector and how it can be integrated with ThingWorx. The attachment contains: A document with detailed instructions about the setup; A .csv file with the multivariate timeseries dataset; A .twx file with some entities that need to be imported in ThingWorx as well as the CSVParser extension that needs to be installed; A .zip file that will need to uploaded in an Azure Blob Container at some point in the setup

Jul 21, 2021

5 Common Mistakes to Developing Scalable IoT Applications by Tori Firewind and the IoT EDC Team Introduction To build scalable applications, it’s necessary to identify common mistakes and avoid them at the early stages of development. In an expert session this past month, the PTC Enterprise Deployment Team elaborated on why scalability is important and how to avoid the common development pitfalls in IoT. That video presentation has been adapted here for visual consumption of the content as well. What is Scalability and Why Does it Matter Enterprise ready applications can scale and easily be maintained, which is important even from day 1 because scalability concerns are the largest cause for delays to Go Lives. Applications balance many competing requirements, and performance testing is crucial to ensure an application is ready for Go Live. However, don't just test how many remote assets can connect at once, but also any metrics that are expected to increase in time, like the number of remote properties per thing, the frequency of reporting from those properties, or the number of users accessing the system at once. Also consider how connecting more assets will affect the user experience and business logic, and not just the ability to ingest data. Common Mistake 1: Edge Property Updates Because ThingWorx is always listening for updates pushed from the Edge and those resources are always in use, pulling updates from the Foundation side wastes resources. Fetch from remote every read is essentially a round trip, so it's slower and more memory intensive, but there are reasons to do it, like if the quality tag is needed since the cache doesn't store it. Say a property is pushed at 11:01, and then there's a network issue at 11:02. If the property is pulled from the cache, it will pull the value sent at 11:01 without any indication of there being a more recent value on the Edge device. Most people will use the default options here: read from server cache, which relies on the Edge to push updates, and the VALUE push type, and configuring a threshold is a good idea as well. This way, only those property updates which are truly necessary are sent to the Foundation server. Details on property aspects can be found in KCS Article 252792. This is well documented in another PTC Community post. This approach is necessary and considered a best practice if there is event logic which depends on multiple properties at once. Sending all of the necessary properties to determine if an event should fire in one Infotable ensures there is no need to query the database each time a property update comes in from the Edge, which ensures independent business logic and reduces the load on the database to improve ingestion performance. This is a very broad topic and future articles will address it more specifically. The When Disconnected property aspect is a good way to configure what happens with Edge property values in a mass disconnect scenario. If revenue depends on uptime, consider losing any data that changes while a device is disconnected. All of the updates can be folded into a single value if the changes themselves aren't needed but an updated value is needed to populate remote properties upon reconnect. Many customers will want to keep all of their data, even when a device is offline and use data stores. In this case, consider how much data each Edge device can store (due to memory limitations on the devices themselves), and therefore how long an outage can last before data is lost anyway. Also consider if Foundation can handle massive spikes in activity when this data comes streaming in. Usually, a Connection Server isn't enough. Remember that the more data needs to be kept, the greater the potential for a thundering herd scenario. Handling a thundering herd scenario goes beyond sizing considerations. It is absolutely crucial to randomize the delay each device will wait before attempting to reconnect. It should be considered a requirement to have the devices connect slowly and "ramp up" over time for multiple reasons. One is that too much data coming in too fast could overwhelm the ingestion queue and result in data loss. Another is that the business logic could demand so many system resources, that the Foundation server crashes again and again and cannot be recovered. Turning off the business logic it isn't possible if the downtime is unexpected, so definitely rely instead on randomized reconnection times for Edge devices. Common Mistake 2: Overlooking Differences in HA To accommodate a shared thing model across many servers, changes had to be made in how the thing model is stored and the model tree is walked by the Foundation servers. Model information is no longer cached at the Thing level, and the model tree is therefore walked every time model information is needed, so the number of times a Thing is directly referenced within each service should be limited (see the Help Center for details). It's best to store whatever information is needed from a Thing in an Infotable, making the Things[thingName] reference a single time, outside of any loops. Storing the property definitions outside of the loop prevents the repetitious Thing references within the service, which otherwise would have occurred twice for each property (for both the name and the description), and then again for every single property on the Thing, a runtime nightmare. Certain states previously held in memory are now shared across the cluster, like property values, Thing states, and connection statuses. Improvements have been made to minimize the effects of latency on queries, like how they now only return property values on associated Thing Shapes or Thing Templates. Filtering for properties on implementing Things is still possible, but now there is a specific service to do it, called GetThingPropertyValues (covered in detail in the Help Center). In the script shown above, the first step is a query to get the names of all implementing things of a particular Thing Shape. This is done outside of any loops or queries, so once per service call. Then, an Infotable is built to store what would have been a direct reference to each thing in a traditional loop. This is a very quick loop that doesn't add much by way of runtime since it is all in memory, with no references to the thing model or the database, instead using the results of the first query to build the Infotable. Finally, this thing reference Infotable is passed into the new service GetThingPropertyValues to retrieve all of the property info for all of these things at once, thereby only walking the thing model once. The easiest mistake people would make here is to do a direct thing reference inside of a loop, using code like Things[thingName].Get() over and over again, thereby traversing the thing model repeatedly and adding a lot of runtime. QueryImplementingThingsOptimized is another new service with new parameters for advanced configuration. Searches can now be done on particular networks or to particular depths, and there's an offset parameter that allows for a maximum number of items to be returned starting at any place in the list of Things, where previously if you needed the Things at the end of the list, you had to return all of the Things. All of these options are detailed in the Help Center, as well as the restrictions listed in the image above. Common Mistake 3: Async Service Misuse Async services are sometimes required, say if a user has to trigger many updates on many remote things at once by the click of a button on a mashup that should not be locked up waiting for service completion. Too many async service calls, though, result in spikes in activity and competition for resources. To avoid this mistake, do not use async unless strictly necessary, and avoid launching too many async threads in parallel. A thread dump will show how many threads there are and what they are doing. Common Mistake 4: Thread Pool Overload Adding more threads to the pool may be beneficial in certain circumstances, like if the threads are waiting on other resources to complete their tasks, look stuff up in the database (I/O), or unlock data that can only be accessed one thread at a time (property writes). In this case, threads are waiting on other resources, and not the CPU, so adding more threads to the pool can improve performance. However, too many threads and performance degradation will occur due to increased contention, wasted CPU cycles, and context switches. To check if there are too many or not enough threads in the pool, take thread dumps and time the completion of requests in the system. Also watch the subsystem memory usage, and note that the side of the queue should never approach the max. Also consider monitoring the overall performance of the system (CPU and Memory) with a tool like Grafana, and remember that a good performance test properly exercises all of the business logic and induces threads in a similar way to real world expectations. Common Mistake 5: Stream Etiquette Upserts, or updates to database tables, are expensive operations that can interfere with ingestion if they are performed on the wrong tables. This is why Value Stream and Stream data should never be updated by end users of the application. As described in the DGIS document on best practices, aggregation is the key to unlocking optimal performance because it reduces the size of database tables that require upserts. Each data structure shown here has an optimal use in a well-designed ThingWorx application. Data Tables are great for storing overview information on all of the Things in one view, and queries on this data source are the fastest. Update this data source as often as possible (by timer), allowing enough time for updates to be gathered and any necessary calculations made. Data Tables can also be updated by end users directly because each row locks one at a time during updates. Data Tables should be kept as small as possible to improve performance on mashups, so for instance, consider using one to show all Things per region if there are millions of Things. Roll up information is best stored here to avoid calculations upon mashup load, and while a real-time view of many thousands of things at once is practically impossible, this option allows for a frequently updates overview of many things, which can also drill down to other mashup views that are real-time for one Thing at a time. Value Streams are best used for data ingestion, and queries to these should be kept to a minimum, largely performed by the roll up logic that populates the Data Tables mentioned above. Queries that chart all of the data coming in are best utilized on individual Thing views so that only a handful of users are querying the same data sources at a time. Also be sure to use start and end dates and make use of the "source" field to improve query performance and create a better user experience. Due to the massive size of the corresponding database tables, it's best to avoid updating Value Streams outside of the data ingestion process altogether. Streams are similar, but better for storing aggregated, historical data. Usually once per day or per week (outside of business hours if possible), Value Stream data will be smoothed or reduced into less data points and then stored into Streams. This allows for data to be stored for longer periods of time on the server without using up as much memory or hurting query performance. Then the high volume ingested data sources can be purged frequently, as discussed below. Infotables are the most memory intensive, and are really designed to hold only a small number of rows at a time, usually to facilitate the business logic. Sometimes they will be stored in Streams or Data Tables if they aren't expected to grow larger (see the DGIS Coffee Machine App for an example). Infotables should never be logged; if they are used to transmit Edge property updates (like in the Property Set Approach), they should be processed into other logged (usually local) properties. Referring to the properties themselves is how to get real-time information on a mashup, say by using the GetProperties service and its auto-update option, which relies on internal websockets. This should be done on individual Thing views only, and sizing considerations need to be made if there will be many of these websockets open at once, say if there are many end users all viewing real-time data at a time. In the newer versions of ThingWorx, these cannot be updated directly, so find the system object called ThingWorxPersistenceProvider and use the service UpdateStreamDataProcessingSettings. ThingWorx Foundation processes data received from remote devices in batches in order to manage the data flow and reduce database churn. All of these settings configure how large those batches are and how frequently they are flushed to the database (detailed in full in KCS Article 240607). This is very advanced configuration that heavily depends on use case and infrastructure, but some info applies to most people: adjusting the scan rate is usually not beneficial; a healthy queue should never approach the max limit; and defaults differ by database because they function differently. InfluxDB generally works better when there are less processing threads and higher numbers of things per thread, while PostgresDB can have a lot of threads, preferably with less things per thread. That's why the default values shown here are given as the same number of threads (and this can be changed), but Influx has a larger block size and size threshold because it can handle more items per thread. Value Streams ingest all data into the Foundation server, and so the database tables that correspond with these data sources grow very large, very quickly and need to be purged often and outside of business hours, usually once a day or once per week. That's why it's important to reduce the data down to less points and push them into Streams for historical reference. For a span of years, consider a single point a day might be enough, for a span of hours, consider a data point a minute. Push aggregated data into Streams and then purge the rest as soon as it is no longer needed. In Conclusion

Jun 29, 2021

Those who have been working with ThingWorx for many years will have noticed the work done around ingress stress testing and performance optimization. Adding InfluxDB as a time-series data persistence provider really helped level up these capabilities while simultaneously decreasing the overall resources required by the infrastructure. However with this ease comes a hidden challenge: query and data processing performance to work it into something useful. Often It's Too Much Data In general most customers that I work with want to collect far too much data -- without knowing what it will be used for, or what processing will be required in order to make it usable and useful. This is a trap in general with how many people envision IoT projects, being told by infrastructure providers that cloud storage and compute resources are abundant and cheap and that they should get as much data as possible. This buildup of data means that more effort needs to be spent working it into something useful (data engineering/feature extraction) and addressing common data issues (quality, gaps, precision, etc.). This might be fine for mature companies with large data analytics teams; however this is a makeup that I've only seen in the largest of our customers. Some advice - figure out what you need and how you'll use it, and then collect that. Work on extracting value today rather than hoping that extra data collected now will provide some insights years from now. Example - Problem Statement You got your Thing Model designed, and edge devices connected. Now you've got data flowing in and being stored every 5 seconds in InfluxDB. Great progress! Now on to building the applications which cover the various use cases. The raw data is most likely going to need to be processed and potentially even significantly transformed into other information in order to make it useful. Turning a "powered on and running" BOOLEAN to an "hour meter" INTEGER is a simple example. Then you may need to provide a report showing equipment run time hours by day over a month. The maintenance team may also have asked to look for usage patterns which lead to breakdowns, requiring extracting other data points from the initial one like number of daily starts, average daily run time, average time between restarts. The problem here is that unless you have prepared these new data points and stored them as well (say in a Stream), you are going to have to build these data sets on the fly, and that can be time and resource intensive and not give you the response time expected. As you can imagine, repeatedly querying and processing large volumes of unchanging raw data is going to have resource and time implications - so this is why data collection and data use need to be thought about separately. Data Engineering In the above examples, the key is actually creating new data points which are calculated progressively throughout normal operation. This not only makes the information that you want available when you need it - in the right format - but it also significantly reduces resource requirements by constantly reprocessing raw data. It also helps managing data purging, because as you create and store usable insights, you can eventually just archive away your old raw data streams. Direct Database Queries vs. Thingworx Data Services Despite the above being a rule of thumb, sometimes a simple well structured database query can get you exactly what you need and do so quite quickly. This is especially true for InfluxDB when working with extremely large time-series datasets. The challenge here is that ThingWorx persistence providers abstract away the complexity of writing ones own database queries, so we can't easily get at the databases raw power and are forced to query back more data than needed and work it into a usable format in memory (which is not fast). Leveraging the InfluxDB API using the ContentLoader Technique As InfluxDBs API is 100% REST, we can access it using in-built ThingWorx Content Loader services. Check out this demonstration and explanation video where I talk about how to interact directly with InfluxDB in order to crush massive time-series data and get back much more usable and manageable data sets. It is important to note here that you should use a read-only database user here, as you should never modify the ThingWorx databases to avoid untested scenarios which may lead to data corruption. Optimizing ThingWorx query performance with the InfluxDB REST API - YouTube InfluxToolBox ThingWorx demo project (by T. Wobben)

Jun 27, 2021

ThingWorx 9.2 is here! Deploy an entire solution and all its dependencies in one click with Solution Central’s one-click deploy, garner deeper analytic insight with our new waterfall charts, and manage and authenticate users more seamlessly with an Azure Active Directory integration. Discover these features and more in my 9.2 preview post here! Review our release notes here and be sure to upgrade to 9.2! Stay connected, Kaya

Jun 24, 2021

We will host a live Expert Session: "5 Common Mistakes for Developing Scalable IoT Applications" on June 22nd, 11h00 EST. Please find below the description of the expert session and the registration link. Expert Session: 5 Common Mistakes for Developing Scalable IoT Applications Date and Time: June 22nd, 11h00 EST Duration: 1 hour Host: Tori Firewind, Mike Jasperson and Prachi Rath - Enterprise Deployment Center Registration Here: https://www.ptc.com/en/resources/iiot/webcast/5-common-dev-mistakes-for-scalable-iot-applications Description: To build scalable applications, it’s necessary to identify the common mistakes made and ensure to avoid them at the early stages of development. In this expert session, the PTC Enterprise Deployment Team will elaborate on why scalability is important and how one can avoid the common development pitfalls in IoT. Existing Recorded sessions can be found on support portal using the keyword ‘Expert Sessions’. You can also suggest topics for upcoming sessions using this small form. Here are some recorded sessions that might be of your interest. You can find recordings for the full library of webinars using the keyword ‘Expert Sessions’ in PTC support portal search Thingworx Active Active Clustering This session will cover the main aspects of the High Availability Clustering feature launched with the ThingWorx 9.0 release. Recoding Link Upgrade to Thingworx 9 – How to Plan / Evaluate Impacts This session highlights the key points you should evaluate to properly plan your upgrade to Thingworx 9. Recording Link Top 5 items to check for Thingworx Performance Troubleshooting How to troubleshoot performance issues in a Thingworx Environment? Here we cover the top 5 investigation steps that will help you understand the source of your environment issues and allow better communication with PTC Technical Support Recording Link

Jun 17, 2021

Hi, everyone! Today, we’re launching an exciting new series called “PTC Community Spotlights.” Each post in the series explores a community member’s experience with ThingWorx—how they’re using it, what their favorite part about ThingWorx is, and any tips or tricks they may have to share with the PTC Community. For the first installment, I spoke with @nmilleson of EAC. Check out our conversation below. Our first PTC Community Spotlight Speaker -- Nick Milleson of EAC Product Development Systems. @Kaya: Hi, @nmilleson, welcome! Thank you for taking the time to meet with me and volunteering to be our first ThingWorx Community spotlight! @nmilleson: Of course, @Kaya. Happy to be here. @Kaya: To start, can you tell me a little about yourself? @nmilleson: Absolutely. My name is Nick Milleson. I work as an IoT Solution Architect at EAC Product Development Systems (a PTC Partner). I’m located in Apple Valley, Minnesota, which is a suburb of the Twin Cities. @Kaya: Nice! We always love hearing from our partners about the awesome work they do. As a PTC Partner, what industries do you typically work in? @nmilleson: I consult for many, many different industries, including defense, transportation, medical devices, construction & aerospace. @Kaya: Wow, so what PTC products are you most familiar with? @nmilleson: My schooling is in mechanical engineering, so I’ve also used Creo, Windchill, and MathCAD. I have been working with the ThingWorx application and helping clients get the most out of ThingWorx for approximately 7 years. @Kaya: Seven years—that’s a while! Do you have any “ThingWorx” stories from over the years you can share with your community peers? @nmilleson: Sure thing. I think the coolest thing that I’ve done with ThingWorx was create a custom SVG infographic that featured animations, click events, zoom-ins, and heatmaps based on temperature deltas. It was a custom widget and it worked really well in ThingWorx. When I first started learning to use ThingWorx, I took apart an old RC car and hooked up an Arduino to the motors and steering. I was then able to control it using a ThingWorx mashup. Pretty fun! I’ll be sure to share a visual so people can check it out. Nick's awesome custom SVG infographic featuring a ton of neat functionality like zoom-ins & heatmaps. @Kaya: That’s awesome! Sounds like a fun time indeed. I saw that one of your first publications about ThingWorx for EAC was from 2015 and titled “Updating ThingWorx Using an Arduino Uno and a Serial Connection.” The ThingWorx platform has certainly evolved since then. What would you say is your favorite thing about ThingWorx today? @nmilleson: It sure has evolved. I would say my favorite thing is that it’s flexible enough to allow you the freedom to design all sorts of applications, while also providing you with all these great tools that make it easy to use as well. @Kaya: Thanks for that. I can see that you have been a member of the PTC Community for five years. Thank you for providing such great contributions. What do you enjoy most about the PTC Community? @nmilleson: I enjoy this Community because everyone seems very willing to help each other out, regardless of the complexity of the issue. I stick mostly with the IoT Developers section, but I’ll meander into the Manufacturing Apps and ThingWorx Ideas once in a while as well. @Kaya: Love to hear it. Now, so the PTC Community can learn a little more about you, how do you spend your time when you aren’t playing with ThingWorx or engaging on the PTC Community? @nmilleson: Great question. I have been a professional piano player for almost 20 years, so I’m often at a piano bar making music when I’m not doing software development with EAC. @Kaya: Awesome. Well those are all the questions I have for today. Thank you for sharing your experience with ThingWorx! Truly appreciate it. @nmilleson: Of course. Happy to be a part of it! Kaya, here. We love hearing from community members like @nmilleson about how ThingWorx creates value for them amongst a variety of use cases. If you’re active on the community and interested in being featured on the PTC Community Spotlight series, send me a direct message and we’ll get the ball rollin’. For now, we’ll let Nick “play us” out. Until next time, stay connected! -Kaya

Jun 16, 2021

Recently I have been accompanying an integration partner and end customer around an issue experienced with ThingWorx resource exhaustion. Early on it seemed like this was an issue with the ThingWorx Azure IoT Hub Connector as it would freeze up and become unresponsive. Following a root cause analysis it became clear that it was actually caused by a lack of a number of standard cloud design patterns, which if used would have automatically adapted operation of the overall solution to be far more resilient as well as resource optimized. The way that the logic was structured, it prioritized job execution on entities with the oldest last success time and would continue to retry these executions (IoT Direct Methods) every few seconds until successful. There were a number of problems here, but I'll unpack a few in order to tie the problem to the solution via design patterns. 1) No exception handling When the direct method execution failed/timed out or the system reported being unable to execute the remote service, this response was not used to adapt the solutions behavior. 2) No backoff retry mechanism As exceptions were not caught, an adaptive retry mechanism with incremental or exponential backoff could not be leveraged to limit the impact of the build up of the failing retries. 3) No exception tracking Tracking that exceptions were occurring and counting them would allow powering an exponential backoff retry algorithm (with jitter), a Cancel or Circuit Breaker pattern (stop doing something which is just broken), as well as provided alerting to address specific areas of the distributed solution experiencing issues. 4) Conflicting priorities It was interesting to see the manifestation of the conflicting interests of wanting to ensure checks and balances (had all needed data) and system resiliency. Retries and resource usage built up exponentially due to the transient error instead of backing them off. Trying so hard to get the needed data from failing sensors meant that operational sensors were deprioritized and their data was not received either - spreading the localized issue to the whole system. Around the time that I shared my recommendations and some examples of how to make the solution more resilient, one of my technical colleagues at Microsoft shared some extremely interesting and relevant design patterns documented by Microsoft as a part of the "Microsoft Azure Well-Architected Framework". This framework with included Design Patterns for specific cloud application goals allows applying well-known industry standard approaches to dealing with the challenges of large scale distributed enterprise systems (reliability, performance, cost optimization). She later then shared this blog post describing exactly the exponential backoff retry with jitter pattern which we had together recommended to the systems integrator. What's interesting for us ThingWorx people is that this framework from Microsoft is about well-architected cloud solutions and does not specifically reference the Azure stack, and as such many of these approaches and design practices can be employed in your ThingWorx applications. What are you waiting for? Go check them out!

Jun 10, 2021

In our interactions with PTC customers we often learn they have previously performed Analytics modeling in Python, Matlab, R, or even built home grown analyses in languages such as Java or C++. As expected, when adopting an Industrial Innovation Platform such as ThingWorx that also has its own ThingWorx Analytics module, customers do not want to reimplement everything from scratch and would rather integrate their previous work in the Smart Applications built in ThingWorx, leveraging a combination of their existing toolset together with ThingWorx Analytics modeling. That is certainly possible and there are multiple ways to do that. In this article we will focus on several general ways to make that happen, but it is important to keep in mind that language specific approaches are also possible and we are happy to discuss those in the specific context of the customer. Here are five different ways to bring existing Analytics into ThingWorx: If the task is to reuse an existing predictive model developed in a language such as Python/R/Matlab, typically one can export that model in PMML (Predictive Model Markup Language), an xml format, and import it in ThingWorx Analytics using the AnalyticsServer_ResultsThing -> UploadModel service. Libraries such as sklearn2pmml & r2pmml can be utilized towards that goal. The imported model can then be used in the same fashion as a ThingWorx Analytics developed model to power smart applications built in ThingWorx. If the Analysis involves more complex tasks than Predictive Modeling, such as custom data normalizations or non-standard Machine Learning models or home grown algorithms, one can use the options below. Call the ThingWorx exposed REST Web API from Python/Matlab/R/Java/Javascript. Every service from ThingWorx can be called that way, and the API can also be used to push analyses results into ThingWorx for further consumption, perhaps together with other sources of data such as sensor readings, in the smart applications built there. The documentation for the ThingWorx REST API can be found here. Expose the existing Analytics via using a thin layer of REST Web Services. For example, in Python, this can be done using Flask, with few lines of code. Then, the orchestration can happen from ThingWorx by calling the exposed Web Service and weaving the results back into smart applications. Often our customers' current architecture involves a relational database (e.g. SQL Server, Oracle, etc) that is powering the existing Analytics, and stores the end results (predictions, correlations, etc). In this scenario, we can connect ThingWorx directly to that database to read these results. Finally, in the case of complex Analytics, where a tighter integration with ThingWorx is desired, existing Analytics / algorithms can be wrapped into a ThingWorx Extension or an Analytics Provider using the corresponding PTC SDKs. When choosing an integration option, customers need to carefully balance complexity of integration, constraints of their architecture, Analytics modeling complexity, as well as end user consumption requirements.

May 27, 2021

Unlocking the Power of Industrial Data Presentation by Mike Jasperson, VP of the IoT Enterprise Deployment Center his video presentation was performed at the Digital Transformations in Manufacturing conference of 2021, hosted by Enterprise Digital. In this presentation, Mike Jasperson goes over the benefits to modernizing and consolidating access to time-stamped data that is ingested from equipment and sensors into a central location like ThingWorx. Moving away from monolithic, legacy, and siloed systems, and towards more agile solutions, has never been more critical in order to increase machine, operational, and business efficiencies while also opening up visibility into data systems and infrastructure deployments. This video partners with InfluxData to help customers extract value from IoT data systems, maximizing both performance and operational capabilities of their monitoring systems. To stay competitive in the IoT market, it's important to review the best practices for scaling and testing your industrial metrics solutions, as well as how to get the best performance out of your digital data solutions by using time-series optimized databases like InfluxDB. Open source technologies discussed here are a great way to create modular and upgradable solutions and accelerate IoT innovation. (view in My Videos)

May 25, 2021

Sunshine, beach chairs & ThingWorx 9.2. What more could you need for your summer essentials? Targeted for June 2021, our next release features intelligent one-click deploy with Solution Central*, new web components, and an enhanced IAM integration! Let’s dive deeper into each. Deploy an entire solution in one click with Solution Central’s intelligent one-click deploy. Good news: you followed a modular design pattern and broke up your application into smaller libraries and components. You can now enjoy easier maintenance and re-use of your app. Bad news: your app now has 10 different dependencies, with differing versions, each with a required order to import into ThingWorx. Now, try to share these modules with colleagues, or use them on environments where code may already exist. Not exactly a day at the beach, right? Fear not, one-click deploy has you covered. You click the button, we spin through and find the right dependencies, the right versions, the right order and load them all into the target platform upon a deploy request. Solution Central one-click deploy means more sun and sand for you! Check out this post to learn more about what’s available in Solution Central 3.0! Intelligent One-Click Deploy with Solution Central Enhance your solutions with our latest web components! Imagine this: you’re a systems developer at a large parts manufacturer and your boss has asked for a detailed analysis of downtime over the last six months. Not to worry! ThingWorx 9.2 features a new waterfall chart that can be leveraged to understand dynamics in defect counts, loss reasons, time bottlenecks and other conditions. Be sure to try it out! And, while you’re at it, try out our new web components that are available now as preview: a toolbar to add key like filtering at the top of your screen or data intensive widgets (e.g., grids), a more flexible grid and a fancy new paradigm for interface developers. These three preview widgets are fully functional and tested in 9.2. Preview widgets will graduate in a future release when we add all planned functionality or address any perceived usability feedback. Don’t be afraid—it just means more good things are coming. Surf’s up, you can use these widgets safely now! New Waterfall Widget Coming in ThingWorx 9.2 Leverage new integrations with Azure Active Directory for more seamless user management. In prior releases we have offered integration to Azure Active Directory and SSO through Central Authorization Service type products or through custom authenticator extensions to ThingWorx. With our new Azure AD integration, you can cut the custom extensions and additional software out of the picture. We now accept direct SAML assertions from Azure AD directly to ThingWorx platform, which makes it that much easier to deploy your app in your organization’s SSO flow. It’s as smooth as that frosty tropical drink when the sun goes down. Like what you see? Want to try it out for yourself? ThingWorx 9.2 is targeted for June 2021, so be sure to keep a lookout on the horizon. Bump, set, spike! Stay cool & connected, Kaya

May 13, 2021

Hi everyone, We’re back! And we’ve got exciting news about Solution Central! As tempted as I am to share the news myself, I thought it only fitting to have Janie Pascoe, Product Manager of Solution Central, share the news with you. You may remember Janie from this post on ThingWorx's OPCUA functionality or this one on what it’s like to transition from a ThingWorx developer to a ThingWorx product manager. Janie, welcome back! The floor is yours. Janie, PM of Solution Central: Thank you so much Kaija. I wanted to bring your readers up to speed on some of the latest and greatest in Solution Central. If you haven’t logged in to the Solution Central portal in a while, I highly recommend you do so because you will immediately be notified of what’s new in the application as you can see here due to our newly added what’s new popup blurb! But in the spirit of giving you even more detail, let me tell you a bit more about what is new in Solution Central 3.0. This release is full of more intelligence than ever before! You can now not only deploy the solutions themselves from Solution Central but deploy all of a solution’s dependencies with the single click of a button. So, instead of having to deploy each dependency separately and in a specific order, Solution Central is now smart enough to understand the dependencies and deploy them for you. We have also added enhancements to our Solution Detail panel to make it even more intuitive and easy find what you’re looking for. And when it comes to clean up activities, we have you covered. Solution Central can now forget an instance when it’s no longer needed—no more questioning whether an instance is in active use or not. Kaya: Thanks, Janie. Exciting stuff! Readers, you can learn all about these new features and more in our Release Notes and Help Center documentation. Be sure to try out the latest functionality! Any questions, comments, or ideas for enhancements to Solution Central can be sent directly to jpascoe@ptc.com. Stay on the lookout for our next release! As always, stay connected, Kaya

May 10, 2021

Interested in learning how others using and/or hosting ThingWorx solutions can comply with various regulatory and compliance frameworks? Based on inquiries regarding the ability of customers to meet a wide range of obligations – ranging from SOC 2 to ISO 27001 to the Department of Defense’s Cybersecurity Maturity Model Certification (CMMC) – the PTC's IoT Product Management and EDC teams have collaborated on a set of detailed articles explaining how to do so. Please check out the ThingWorx Compliance Hub (support.ptc.com login required) for more information!

Apr 20, 2021

Analytics projects typically involve using the Analytics API rather than the Analytics Builder to accomplish different tasks. The attached documentation provides examples of code snippets that can be used to automate the most common analytics tasks on a project such as: Creating a dataset Training a Model Real time scoring predictive and prescriptive Retrieving the validation metrics for a model Appending additional data to a dataset Retraining the model The documentation also provides examples that are specific to time series datasets. The attached .zip file contains both the document as well as some entities that you need to import in ThingWorx to access the services provided in the examples.

Apr 16, 2021

The New and Improved DGIS Guide to ThingWorx Development Written by Victoria Firewind of the IoT EDC The classic Developing Great IoT Solutions guide has been reskinned and revamped for newer versions of ThingWorx! The same information on how to build a quality IoT application is now available for versions of ThingWorx 9.1+, and now, a complete sample application is included to demonstrate these ideas. Find within the attached archive a PDF with high-level overview information on development and application design geared towards managers and business users, so that everyone can understand the necessary requirements, common terms, and key tips on how to ensure an application is scalable and maintainable right from the very start. Reduce your chances of running into issues between PoC and Go Live by reviewing this information today! Also find within this PDF a series of tutorials which teach not just how to use the ThingWorx software, but which also educate on how to make good application design choices. A basic rules engine for sending real-time notifications is included here, as well as a complete demo application which illustrates each concept in a real-world use case. This Coffee Machine Demo App relies upon the tutorial entities, which can also now be imported directly using the other XML files provided here. This ensures that anyone can review these concepts, regardless of how much time one can commit or how much knowledge one already has on the subject. This is a complex guide, and any issues, questions, or bugs found within can be reported right here on this thread. Happy developing from the IoT EDC!

Apr 13, 2021

ThingWorx community, As part of a continuous re-evaluation of our third-party software requirements, we regularly add and remove support for versions of operating systems, persistence providers, and web browsers. On this note, we are planning to end support for Ubuntu 18.04 beginning with the ThingWorx release targeted for mid-CY2022. Per its release cycle, Ubuntu will move version 18.04 into Extended Security Maintenance at this point, meaning it will no longer receive regular maintenance updates. PTC will continue to support Ubuntu 20.04 for the immediate future and will consider supporting Ubuntu 22.04 once it becomes Generally Available (GA). Please let me know if you have any questions or concerns. Regards, Walter Haydock ThingWorx Product Management

Apr 2, 2021

ThingWorx community, As part of a continuous re-evaluation of our third-party software requirements, we regularly add and remove support for versions of operating systems, persistence providers, and web browsers. On this note, for the ThingWorx release after 9.2 (currently targeted for the end of CY 2021), we are planning to end support for Windows Server 2016. Already five years old, this product has generally been replaced by Windows Server 2019 in common usage. As per Microsoft’s published lifecycle, this version will be going out of "Mainstream Support" approximately one year after the aforementioned ThingWorx release. PTC will continue to support Windows Server 2019 for the immediate future, and will consider supporting Windows Server 2022 once it becomes Generally Available (GA). Please let me know if you have any questions or concerns. Regards, Walter Haydock ThingWorx Product Management

Mar 29, 2021

By Tim Atwood and Dave Bernbeck, Edited by Tori Firewind Adapted from the March 2021 Expert Session Produced by the IoT Enterprise Deployment Center The primary purpose of monitoring is to determine when your application may be exhausting the available resources. Knowledge of the infrastructure limits help establish these monitoring boundaries, determining straightforward thresholds that indicate an app has gone too far. The four main areas to monitor in this way are CPU, Memory, Networking, and Disk. For the CPU, we want to know how many cores are available to the application and potentially what the temperature is for each or other indicators of overtaxation. For Memory, we want to know how much RAM is available for the application. For Networking, we want to know the network throughput, the available bandwidth, and how capable the network cards are in general. For Disk, we keep track of the read and write rates of the disks used by the application as well as how much space remains on those. There are several major infrastructure categories which reflect common modes of operation for ThingWorx applications. One is Bare Metal, which relies upon the traditional use of hardware to connect directly between operating system and hardware, with no intermediary. Limits of the hardware in this case can be found in manufacturing specifications, within the operating system settings, and listed somewhere within the IT department normally. The IT team is a great resource for obtaining these limits in general, also keeping track of such things in VMware and virtualized infrastructure models. VMware is an intermediary between the operating system and the hardware, and often its limits are determined based on the sizing of the application and set by the IT team when the infrastructure is established. These can often be resized as needed, and the IT team will be well aware of the limits here, often monitoring some of the performance themselves already. This is especially so if Cloud Providers are used, given that these are scaled up virtualizations which are configured in easy-to-use cloud portals. These two infrastructure models can also be resized as needed. Lastly Containers can be used to designate operating system resources as needed, in a much more specific way that better supports the sharing of resources across multiple systems. Here the limits are defined in configuration files or charts that define the container. The difficulties here center around learning what the limits are, especially in the case of network and disk usage. Network bandwidth can fluctuate, and increased latency and network congestion can occur at random times for seemingly no reason. Most monitoring scenarios can therefore make due with collecting network send and receive rates, as well as disk read and write rates, performed on the server. Cloud Providers like Azure provide VM and disk sizing options that allow you to select exactly what you need, but for network throughput or network IO, the choices are not as varied. Network IO tends to increase with the size of the VM, proportional to the number of CPU cores and the amount of Memory, so this may mean that a VM has to be oversized for the user load, for the bulk of the application, in order to accommodate a large or noisy edge fleet. The next few slides list the operating metrics and common thresholds used for each. We often use these thresholds in our own simulations here at PTC, but note that each use case is different, and each situation should be analyzed individually before determining set limits of performance. Generally, you will want to monitor: % utilization of all CPU cores, leaving plenty of room for spikes in activity; total and used memory, ensuring total memory remains constant throughout and used memory remains below a reasonable percentage of the total, which for smaller systems (16 GB and lower) means leaving around 20% Memory for the OS, and for larger systems, usually around 3-4 GB. For disks, the read and write rates to ensure there is ample free space for spikes and to avoid any situation that might result in system down time; and for networking, the send and receive rates which should be below 70% or so, again to leave room for spikes. In any monitoring situation, high consistent utilization should trigger concern and an investigation into what’s happening. Were new assets added? Has any recent change caused regression or other issues? Any resent changes should be inspected and the infrastructure sizing should be considered as well. For ThingWorx specific monitoring, we look at max queue sizes, entries performed, pool sizes, alerts, submitted task counts, and anything that might indicate some kind of data loss. We want the queues to be consistently cleared out to reduce the risk of losing data in the case of an interruption, and to ensure there is no reason for resource use to build up and cause issues over time. In order for a monitoring set-up to be truly helpful, it needs to make certain information easily accessible to administrative users of the application. Any metrics that are applicable to performance needs to be processed and recorded in a location that can be accessed quickly and easily from wherever the admins are. They should quickly and easily know the health of the application from a glance, without needing to drill down a lot to be made aware of issues. Likewise, the alerts that happen should be meaningful, with minimal false alarms, and it is best if this is configurable by the admins from within the application via some sort of rules engine (see the DGIS guide, soon to be released in version 9.1). The monitoring tool should also be able to save the system history and export it for further analysis, all in the name of reducing future downtime and creating a stable, enterprise system. This dashboard (above) is a good example of how to rollup a number of performance criteria into health indicators for various aspects of the application. Here there is a Green-Yellow-Red color-coding system for issues like web requests taking longer than 30s, 3 minutes, or more to respond. Grafana is another application used for monitoring internally by our team. The easy dashboard creation feature and built-in chart modes make this tool super easy to get started with, and certainly easy to refer to from a central location over time. Setting this up is helpful for load testing and making ready an application, but it is also beneficial for continued monitoring post-go-live, and hence why it is a worthy investment. Our team usually builds a link based on the start and end time of tests for each simulation performed, with all of the various servers being monitored by one Grafana server, one reference point. Consider using PTC Performance Advisor to help monitor these kinds of things more easily (also called DynaTrace). When most administrators think of monitoring, they think of reading and reacting to dashboards, alerts, and reports. Rarely does the idea of benchmarking come to mind as a monitoring activity, and yet, having successful benchmarks of system performance can be a crucial part of knowing if an application is functioning as expected before there are major issues. Benchmarks also look at the response time of the server and can better enable tracking of actual end user experience. The best option is to automate such tests using JMeter or other applications, producing a daily snapshot of user performance that can anticipate future issues and create a more reliable experience for end users over time. Another tool to make use of is JMeter, which has the option to build custom reports. JMeter is good for simulating the user load, which often makes up most of the server load of a ThingWorx application, especially considering that ingestion is typically optimized independently and given the most thought. The most unexpected issues tend to pop up within the application itself, after the project has gone live. Shown here (right) is an example benchmark from a Windchill application, one which is published by PTC to facilitate comparison between optimized test systems and real life performance. Likewise, DynaTrace is depicted here, showing an automated baseline (using Smart URL Detection) on Response Time (Median and 90th percentile) as well as Failure Rate. We can also look at Throughput and compare it with the expected value range based on historical throughput data. Monitoring typically increases system performance and availability, but its other advantage is to provide faster, more effective troubleshooting. Establish a systematic process or checklist to step through when problems occur, something that is organized to be done quickly, but still takes the time to find and fix the underlying problems. This will prevent issues from happening again and again and polish the system periodically as problems occur, so that the stability and integrity of the system only improves over time. Push for real solutions if possible, not band-aids, even if more downtime is needed up front; it is always better to have planned downtime up front than unplanned downtime down the line. Close any monitoring gaps when issues do occur, which is the valid RCA response if not enough information was captured to actually diagnose or resolve the issue. PTC Tech Support developed a diagnostic data gathering query for Oracle that customers can use, found in our knowledgebase. This is an example of RCA troubleshooting that looks at different database factors, reporting on which queries perform the worst based on inputted criteria. Another example of troubleshooting is for the Java JVM, where we look at all of the things listed here (below) in an automated, documented process that then generates a report for easy end user consumption. Don’t hesitate to reach out to PTC Technical Support in advance to go over your RCA processes, to review benchmark discrepancies between what PTC publishes and what your real-life systems show, and to ensure your monitoring is adequate to maintain system stability and availability at all times.

Mar 24, 2021

Check our expert session recorded library! The recordings will also be published in our Customer events library, posted on each event. Stay tunned! Your feedback is very important to us! After watching the recordings, please take 2 min to complete this survey Thingworx Foundation Session Name Link Duration Thingworx Mashup 101 - Do's and Don'ts Recording link 00:33:41 Thingworx Active Active Clustering (High Availability Recording link 00:26:24 Upgrade to Thingworx 9 – How to Plan / Evaluate Impacts Recording link 00:27:02 Thingworx Flow Overview Recording link 00:43:40 Top 5 items to check for Thingworx Performance Troubleshooting Recording link 00:26:55 ThingWorx DEVOPS QuickStart Guide Recording link 00:45:05 ThingWorx Backup And Recovery Recording Link 00:20:14 Expert Session - Designing your Data Model in Thingworx Recording link 00:26:45 ThingWorx Installation Recording link 00:15:07 Expert Session - Introduction To Edge Connectivity Recording link 00:15:56 Expert Session - Basic Mashup Design in Thingworx Recording link 00:36:31 Expert Session - Extensions101 Recording Link 00:30:08 Expert Session – Developing your Data Model in Thingworx Recording link 00:39:19 Thingworx Scalability Recording link 00:09:18 Expert Sessions - ThingWorx Patch Upgrade Recording link 00:03:19 Thingworx Navigate Session Name Link Duration Understanding license requirements for Thingworx Navigate Recording link 00:32:40 Navigate SSL and Authentication Recording Link 00:34:30 Navigate 3D Viewer Recording Link 00:43:25 Component Based App Development Recording Link 00:24:07 Navigate 9.0 – What’s new Recording link 00:27:07 Overview of SSO Implementation for ThingWorx Navigate and Windchill with PingFederate Recording link 00:18:36 Identifying the right SSO mix for Navigate 1 6 Recording link 00:57:56 Navigate Configuration - PingFederate Automation Script Recording link 00:51:07 Expert Session - Navigate Configuration/Windchill Authentication Recording link 00:23:07 What’s new with Navigate 1.8 and the new Navigate 1.8 installer Recording link 01:05:26 Creating an I*E task for use in Navigate Recording link 00:05:36 Vuforia Expert Capture Session Name Link Duration VEC In a Nutshell Video Link 00:31:39

Mar 16, 2021

We will host a live Expert Session: "Top 5 Thingworx environment monitoring best practices" on March 25th, 10h00 EST. Please find below the description of the expert session and the registration link. Expert Session: Top 5 Thingworx environment monitoring best practices Date and Time: March 25th, 10h00 EST Duration: 1 hour Host: Tori Firewind, Tim Atwood and Dave Bernbeck from Enterprise Deployment Center Registration Here: https://www.ptc.com/en/resources/iiot/webcast/top-5-thingworx-monitoring-best-practices In this session, we will be reviewing the main monitoring practices to keep a heathy environment and discuss the main issues from the audience. Bring your questions!. Existing Recorded sessions can be found on support portal using the keyword ‘Expert Sessions’. You can also suggest topics for upcoming sessions using this small form. Here are some recorded sessions that might be of your interest. You can find recordings for the full library of webinars using the keyword ‘Expert Sessions’ in PTC support portal search Thingworx Active Active Clustering This session will cover the main aspects of the High Availability Clustering feature launched with the ThingWorx 9.0 release. Recoding Link Upgrade to Thingworx 9 – How to Plan / Evaluate Impacts This session highlights the key points you should evaluate to properly plan your upgrade to Thingworx 9. Recording Link Top 5 items to check for Thingworx Performance Troubleshooting How to troubleshoot performance issues in a Thingworx Environment? Here we cover the top 5 investigation steps that will help you understand the source of your environment issues and allow better communication with PTC Technical Support Recording Link

Mar 16, 2021

IoT & Connectivity Tips

Thundering Herd Scenarios in ThingWorx

ThingWorx Docker Overview and Pitfalls to Avoid

Using the multivariate Azure Anomaly Detector with ThingWorx

5 Common Mistakes for Developing Scalable IoT Applications

Optimizing ThingWorx query performance with the InfluxDB REST API

ThingWorx 9.2 is Here!

Live Webinar on June 22nd 11am EST: 5 Common Mistakes for Developing Scalable IoT Applications

PTC Community Spotlight: ThingWorx Part I

Cloud Design Patterns for Resiliency, Scale, Performance, ...

Five ways to integrate external Analytics such as Python, R, Matlab in the ThingWorx Platform

Unlocking the Power of Industrial Data

ThingWorx 9.2 Preview: One-Click Deploy, Waterfall Chart & Expanded IAM Functionality

What's New in Solution Central 3.0!

Achieving Regulatory Compliance with ThingWorx

Analytics Services Examples

Introducing the New and Improved DGIS Guide to ThingWorx Development

Plans to drop ThingWorx support for Ubuntu 18.04 in mid-CY2022

Plan to drop ThingWorx support for Windows Server 2016

Top 5 Monitoring Best Practices

Webinar / Expert Sessions Recording Library

Live Webinar: Top 5 Thingworx environment monitoring best practices

ThingWorx Learning Paths

Transaction Handling in ThingWorx

Getting Started on the ThingWorx Platform Learning Path