cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
Showing results for 
Search instead for 
Did you mean: 

Community Tip - Visit the PTCooler (the community lounge) to get to know your fellow community members and check out some of Dale's Friday Humor posts! X

IoT Tips

Sort by:
JMeter for ThingWorx Overview Apache JMeter is an open-source tool designed for load testing and measuring the performance of a web application. JMeter has a wide range of features to facilitate this testing, including support for a variety of server and protocol types, a full-featured testing IDE with the ability to record the test steps from both a browser or a native application, and built-in debugging tools. Information about JMeter can be found on Apache’s website.   Working with JMeter is not always intuitive, but it also isn’t that much harder than regular software development. Take some time to explore the official Apache JMeter Documentation and figure out where things go and how to mechanically make use of the JMeter IDE. Then step through this tutorial to create a basic test that logins to ThingWorx, accesses a mashup, and clicks on a few widgets. This is the first in a series to come, courtesy of IoT EDC Engineer Tim Atwood ( @atwood ) and the whole EDC team.   Installation Download JMeter from Apache’s website. Unpack the archive and copy the files to a desired location. Run the application by double clicking on the “ApacheJMeter.jar” file within the bin directory. JMeter is now installed and ready to use. Creating a Test Set up a proxy in your browser of choice (or on the OS in settings).   Select the green “templates” icon in JMeter, and then select “Recording” for the template.   Configure the recording template to point towards your ThingWorx Navigate or Foundation server, then click “Create”. Hit “Start” under the “HTTP(S) Test Script Recorder” tab of the new JMeter project. Make sure the port is set correctly under Global Settings.   A pop-up box will appear that always stays visible on top of the active browser window, so that the recording can be controlled and stopped at any time. Leave the “Transaction name” field empty so that each transaction recorded by the software is automatically named after the web request (this helps differentiate one from the other, and they can each be renamed later).   Open your browser, and navigate (via direct URL if possible, to keep things simple) to the mashup you wish to test. Login and let the page load. Click on anything you’d like on the mashup to capture the activity of that test. Then click “Stop” on the pop-up recorder window to stop the recording. Each transaction will be assigned an index as well, and the source code behind each of these transactions can be reviewed and manually modified in the main JMeter window. Here is the login request for instance:   The HTTP Authorization Manager is used to automatically authorize a defined user login for the thread to any of the Base URLs listed. In this case, though, there are two separate servers being accessed during the test, and one may need to be added manually:   Save the project before continuing, as manual modifications come next.   Within the task page as you do the recording, a set of parameters or body data will be recorded. Modifying this is how you want to parametrize the test scenario, variables like the username and password. To simulate logging in as other users, you have to parameterize this, and not rely on the administrator account name and password entered into the browser.   Rename the task controller to “MyTasks” or something more easily identified than the long string it has now:   Some recorded items like static images and stylesheets will be non-essential, things the browser processes for better graphical representation, but which are often cached and do not greatly affect the scalability results of the test. These can be highlighted and disabled all at once:   Also ensure that any cascading stylesheets have been disabled. Enable the “View Results Tree” to ensure you can review the results of the test script during the editing phase. However, this “Listener” element has a high memory footprint during test execution, so it should be disabled before running an actual scale test.   Next we need to parametrize the user login information and pull it from a csv file.   The colon means that “Administrator” is the default user to use for login.   You can add other properties as well, like ramp up time, run time, number of users, and protocols to use. The ramp up time determines how quickly the threads are allocated for the test, which if done slowly enough, prevents the thundering herd scenario. In more complex scenarios, logic controllers can be inserted to control the flow of the test. This allows for options such as if-then conditions for different user permissions, or parameter-based routes for better randomization of actions in different threads. This will be covered in more detail in a future article.   Pre- and Post-Processors can be used as well, with the latter being used here much more than the former, to extract information from the response, in order to then use that as part of the variables going into one of the follow up requests. For example, see the script in this image: This one has a variable that it extracts from the object number property, defined in the CSV file, and converts it into another variable that is used in subsequent scripts. This script uses the object number reference to pull the name out of the body data and make the request, which is then post-processed by a bunch of these extractors. One is a JSON extractor which is trying to get an ID out of the JSON response. There is a regular expression extractor and a bean shell post-processor, which populates some variables based on what it responded with. Once it extracts all of the variables from the response to this particular request (GetSearchResults in this case), it then tailors the additional requests based on these. -   Customize the script according to the needs of your own application. Alternate between recording and manually modifying the recording code to ensure the test performs exactly as required and from the perspective of different users with different permissions. Also vary the type of activity performed on the mashup. Highlight the “View Results Tree” tab and click the green start button at the top of the window to see the results appear.     If you are getting an unauthorized message, ensure that the scope is right for the login information, which may require moving the “HTTP Authentication Manager” component around in the project. Be sure to check the URLs and credentials entered for each type of user. Occasionally the recorder will insert a long authentication string into the URL, and you want to manually set the URL for the credentials to the most generic URL possible for the server. This can be parametrized too: Referencing the CSV file defined here: Which looks like this for a more complicated scenario (covered in the future):  The columns here represent the username, password, object number in Windchill, and object name in Windchill, as well as the wait time used to vary the way the logic is executed and some extra variables which differentiate for the switches what to do to create a more varied and realistic test.   Conclusion Following these steps again and again on the various mashups throughout an application can ensure that a script for each web page and each type of user on each web page is created and added to the testing suite. This results in a load test that is perfectly representative of the real-world user load placed on an application. Load testing is a critical part of the development lifecycle in any application, and ThingWorx is no exception. Any further questions about the capabilities of JMeter not covered here, can be answered by the whole JMeter user manual, found on the Apache website. Future articles will include some basic scripts that test basic things, which can serve as an example for more complex ThingWorx JMeter script development. Here is an example of one tool PTC uses for internal QA of ThingWorx, designed to load test a Navigate application (specifically its built-in mashups):   Something similar to this tool may be available for public use later this summer. In the meantime, feel free to use the tutorial above to create scripts of your own. Any issues building your custom load tests in JMeter can be discussed right here on this thread with our JMeter experts. Happy developing!
View full tip
ThingWorx DevOps By Victoria Firewind,  IoT EDC This presentation accompanies a recent Expert Session, with video content including demos of the following topics:   found here!   DevOps is a process for taking planned changes through development, through testing, and into production,   where they can be accessed by end users.   One test instance typically has automated tests (integration testing) which ensure application logic is preserved in spite of whatever changes the developers are making, and often there is another test instance to ensure the application is usable (UAT testing) and able   to handle a production load (load testing).   So, a DevOps Pipeline starts with a task manager tasking out planned changes, where each task will become a branch in the repository. Each time a new branch is created, a new pipe is needed, which in this case, is produced by Docker Hub.   Developers then make changes within that pipe, which then flow along the pipe into testing. In this diagram, testing is shown as the valve which when open (i.e. when tests all pass) then   allows the changes to flow along the pipe into production.   A good DevOps process has good flow along the various pipelines, with as much automated or scripted as possible to reduce the chances for errors in deployments.   In order to create a seamless pipeline, whether or not it winds up automated, several third party tools are useful:       b                  Container software is a very good way to improve the maintainability and updatability of a ThingWorx instance, while minimizing the amount of resources needed to host each component.   n  1. Create Docker Image Consult the Help Center if need be. Update your YML file with everything you need before starting the image: see the example in the PTC community.   License the instance using the license management website. Follow the instructions from Docker for installing those tools: Docker itself (docker) and Docker Compose (docker-compose).   n   2. Save Docker Image in Docker Repo Docker Hub has some free options, and if a license is purchased,   can host more than a single Docker image and tag. It is also possible to set up your own Docker registry.           n 3. Access the image in Docker Desktop Download Docker Desktop and sign-in to the Docker account which hosts the repository.   Create some folders for storing the h2.env file and the ThingworxStorage and ThingWorxPlatform mounted folders.   Remember to license these containers as well. Developers login to the license management site themselves and put those into the ThingWorxPlatform mounted folder (“license_capability_response.bin”).         Git is a very versatile tool that can be used through many different mediums, like Azure DevOps or Github Desktop.  To get started as a totally new Git user, try downloading Github Desktop on your local machine and create a local repository with the provided sample code.    This can then be cloned on a Linux machine, presumably whichever instance hosts the integration ThingWorx instance, using the provided scripts (once they are configured).  Remember to install Git on the Linux machine, if necessary (sudo apt-get install git).     A sample ThingWorx application (which is not officially supported, and provided just as an example on how to do DevOps related tasks in ThingWorx) is attached to this post in a zip file, containing two directories, one for scripts and one for ThingWorx entities.   Copy the Git scripts and config file into the top level, above   the repository folder, and update the GitConfig.sh file with the URL for your Git repository and your login credentials. Then these scripts can be used to sync your Linux server with your Git repository, which any developer could easily update from their local machine. This also ensures changes are secure, and enables the potential use of other DevOps procedures like tasks, epics, and corresponding branches of code.     Steps to DevOps using the provided code as an example: Clone the repository into the SystemRepository or any other created repository, use the provided scripts in a Linux environment. Import the DeploymentUtilities entity, which again is scripted for Linux or for use with a development IDE with bash support. Then import the ThingWorx application from source control or use the script (which itself makes use of that DeploymentUtilities entity). Now create some local changes, add things, etc. and try out the UpdateApplication script or export to source control and then push to the Git repo. Data and localization table exports are also possible. Run the tests using the provided IntegrationTester thing or create your own by overriding the IntegrationTestTS thing shape, or use the TestTwxApplication script from a Linux terminal. Design a process for your application which  allows for easy application exports and updates to and from a repository, so that developers can easily send in their changes, which can then be easily loaded and tested in another environment.   In Conclusion: DevOps is a complex topic and every PTC customer will have their own process based around their unique requirements and applications. In the future, more mature pipeline solutions will be covered, ones that involve also publishing to Solution Central for easier deployment between various testing instances and production.        
View full tip
Smoothing Large Data Sets Purpose In this post, learn how to smooth large data sources down into what can be rendered and processed more easily on Mashups. Note that the Time Series Chart  widget is limited to load 8,000 points (hard-coded). This is because rendering more points than this is almost never necessary or beneficial, given that the human eye can only discern so many points and the average monitor can only render so many pixels. Reducing large data sources through smoothing is a recommended best practice for ThingWorx, and for data analysis in general.   To show how this is done, there are sample entities provided which can be downloaded and imported into ThingWorx. These demonstrate the capacity of ThingWorx to reduce tens of thousands of data points based on a "smooth factor" live on Mashups, without much added load time required. The tutorial below steps through setting these entities up, including the code used to generate the dummy data.   Smoothing the Data on Mashups Create a Value Stream for storing the historical data. Create a Data Shape for use in the queries. The fields should be: TestProperty - NUMBER timestamp - DATETIME Create a Thing (TestChartCapacityThing) for simulating property updates and therefore Value Stream updates. There is one property: TestProperty - NUMBER - not persistent - logged The custom query service on this Thing (QueryNamedPropertyHistory) will have the logic for smoothing the data. Essentially, many points are averaged into one point, reducing the overall size, before the data is returned to the mashup. Unfortunately, there is no service built-in to do this (nothing OOTB service). The code is here (input parameters are to - DATETIME; from - DATETIME; SmoothFactor - INTEGER): // This is just for passing the property name into the query var infotable = Resources["InfoTableFunctions"].CreateInfoTable({infotableName: "NamedProperties"}); infotable.AddField({name: "name", baseType: "STRING"}); infotable.AddRow({name: "TestProperty"}); var queryResults = me.QueryNamedPropertyHistory({ maxItems: 9999999, endDate: to, propertyNames: infotable, startDate: from }); // This will be filled in below, based on the smoothing calculation var result = Resources["InfoTableFunctions"].CreateInfoTable({infotableName: "SmoothedQueryResults"}); result.AddField({name: "TestProperty", baseType: "NUMBER"}); result.AddField({name: "timestamp", baseType: "DATETIME"}); // If there is no smooth factor, then just return everything if(SmoothFactor === 0 || SmoothFactor === undefined || SmoothFactor === "") result = queryResults; else { // Increment by smooth factor for(var i = 0; i < queryResults.rows.length; i += SmoothFactor) { var sum = 0; var count = 0; // Increment by one to average all points in this interval for(var j = i; j < (i+SmoothFactor); j++) { if(j < queryResults.rows.length) if(j === i) { // First time set sum equal to first property value sum = queryResults.getRow(j).TestProperty; count++; } else { // All other times, add property values to first value sum += queryResults.getRow(j).TestProperty; count++; } } var average = sum / count; // Use count because the last interval may not equal smooth factor result.AddRow({TestProperty: average, timestamp: queryResults.getRow(i).timestamp}); } } Create a Timer for updating the property values on the Thing. The Timer should subscribe to itself, containing this code (ensure it is enabled as well): var now = new Date(); if(now.getMilliseconds() % 3 === 0) // Randomly reset the number to simulate outliers Things["TestChartCapacityThing"].TestProperty = Math.random()*100; else if(Things["TestChartCapacityThing"].TestProperty > 100) Things["TestChartCapacityThing"].TestProperty -= Math.random()*10; else Things["TestChartCapacityThing"].TestProperty += Math.random()*10; Don't forget to set the runAsUser in the Timer configuration. To generate many properties, set the updateRate to a small value, like 10 milliseconds. Disable the Timer after many thousands of properties are logged in the Value Stream. Create a Mashup for displaying the property data and capacity of the query to smooth the data. The Mashup should run the service created in step 4 on load. The service input comes from widgets on the mashup: Bindings: Place a Time Series Chart widget in the bottom of the Mashup layout. Bind the data from the query to the chart. View the Mashup. Note the difference in the data... All points in one minute: And a smooth factor of 10 in one minute: Note that the outliers still appear, and the peaks are much easier to see. With fewer points, trends become easier to spot and data is easier to understand. For monitoring the specific nature of the outliers, utilize alerts and other types of displays. Alternative forms of data reduction could involve using the mean of each interval (given by the smoothing factor) or the min or max, as needed for the specific use case. Display multiple types of these options for an even more detailed view. Remember, though, the more data needs to be processed, the slower the Mashup will load. As usual, ensure all mashups are load tested and that the number of end users per Mashup is considered during application design.
View full tip
ThingWorx Docker Overview and Pitfalls to Avoid    by Tori Firewind of the IoT EDC Containers are isolated and can run side-by-side on the same machine, but they share the host OS, making them more efficient in terms of memory usage and scalability.   Docker is a great tool for deploying ThingWorx instances because everything is pre-packaged within the Docker image and can be stored in a repository ready for deployment at any time with little configuration required.  By using a different container for every component of an application, conflicting dependencies can be avoided. Containers also facilitate the dev ops process, providing consistent application deployments which can be set up, taken down, and tested automatically using scripts.   Using containers is advantageous for many reasons: simplified configuration, easier dev ops management, continuous integration and deployment, cost savings, decreased delivery time for new application versions, and many versions of an application running side-by-side without any wasted resources setting them up or tearing them down. The ThingWorx Help Center is a great resource for setting up Docker and obtaining the ThingWorx Docker files from the PTC Software Downloads website. The files provided by PTC handle the creation of the image entirely, simplifying the process immensely. All one has to do is place the ThingWorx version and all of the required dependencies in the staging folder, configure the YML file, and run the build scripts. The Help Center has all of the detailed information required, but there are a few things worth noting here about the configuration process.   For one thing, the platform-settings.json file is generated based on the options given in the YML file, so configuration changes made within this configuration file will not persist if the same options aren’t given in the YML file. If using Docker Desktop to run an image on a Windows machine, then the configuration options must be given in an ENV file that can be referenced from the command used to start the image. The names of the configuration parameters differ from the platform-settings.json file in ways that are not always obvious, and a full list can be found here.   For example, if extension imports need to be enabled on a ThingWorx instance running in Docker, then the EXTPKG_IMPORT_POLICY_ENABLED option must be added to the environment section of the YML file like this:     environment: - "CATALINA_OPTS=-Xms2g -Xmx4g" # NOTE: TWX_DATABASE_USERNAME and TWX_DATABASE_PASSWORD for H2 platform must # be set to create the initial database, or connect to a previous instance. - "TWX_DATABASE_USERNAME=dbadmin" - "TWX_DATABASE_PASSWORD=dbadmin" - "EXTPKG_IMPORT_POLICY_ENABLED=true" - "EXTPKG_IMPORT_POLICY_ALLOW_JARRES=true" - "EXTPKG_IMPORT_POLICY_ALLOW_JSRES=true" - "EXTPKG_IMPORT_POLICY_ALLOW_CSSRES=true" - "EXTPKG_IMPORT_POLICY_ALLOW_JSONRES=true" - "EXTPKG_IMPORT_POLICY_ALLOW_WEBAPPRES=true" - "EXTPKG_IMPORT_POLICY_ALLOW_ENTITIES=true" - "EXTPKG_IMPORT_POLICY_ALLOW_EXTENTITIES=true" - "EXTPKG_IMPORT_POLICY_HA_COMPATIBILITY_LEVEL=WARN" - "DOCKER_DEBUG=true" - "THINGWORX_INITIAL_ADMIN_PASSWORD=Pleasechangemenow"   Note that if the container is started and then stopped in order for changes to the YML file to be made, the license file will need to be renamed from "successful_license_capability_response.bin" to "license_capability_response.bin" so that the Foundation server can rename it. Failing to rename this file may cause an error to appear in the Application Log, and the server to act as if no license was ever installed: "Error reading license feature info for twx_realtime_data_sub".   In Docker Desktop on a Windows machine, create a file called whatever.env and list the parameters as shown here: Then, reference this environment file when bringing up the machine using the following command in Powershell:      docker run -d --env-file h2.env -p 8080:8080 -v ${pwd}/ThingworxPlatform:/ThingworxPlatform -v ${pwd}/ThingworxStorage:/ThingworxStorage -it <image_id>     Notice in this command that the volumes for the ThingworxPlatform and ThingworxStorage folders are specified with the “-v” options. When building the Docker image in Linux, these are given in the YML file under the volumes section like this (only change the path to local mount on the left side of the colon, as the container mount on the right side will never change):      volumes: - ./ThingworxPlatform:/ThingworxPlatform - ./ThingworxStorage:/ThingworxStorage - ./tomcat-logs:/opt/apache-tomcat/logs     Specifying the volumes this way allows for ThingWorx logs and configuration files to be accessed directly, a crucial requirement to debugging any issues within the Foundation instance. These volumes must be mapped to existing folders (which have write permissions of course) so that if the instance won’t come up or there are any other issues which require help from Tech Support, the logs can be copied out and shared. Otherwise, the Docker container is like a black box which obscures what is really going on. There may not be any errors in the Docker logs; the container may just quit without error with no sign of why it won’t stay up. Checking the ThingWorx and Tomcat logs is necessary to debugging, so be sure to map these volumes correctly.   Once these volumes are mapped and ThingWorx is successfully making use of them, adding a license file to the Docker instance is simple. Use the output in the ThingworxPlatform folder to obtain the device ID, grab a valid license file, and put it right back into that ThingworxPlatform folder, exactly the same way as on a regular instance of ThingWorx. However, if the Docker image is being used for a dev ops process, a license may not be necessary. The ThingWorx instance will work and allow development for a time before the trial license expires, which normally will be enough time for developers to make their changes, push those changes to a repository, and tear the container down.   Another thing worth noting about ThingWorx Docker image creation is that the version of Java supplied in the staging folder must match the compatibility requirements for each version of ThingWorx. This is the version of Java used by the container to run the Foundation server. In versions of ThingWorx 9.2+, this means using the Amazon Corretto version of Java. The image absolutely will not start ThingWorx successfully if older versions of Java are used, even if the scripts do successfully build the image.   Also note that in the newer versions of ThingWorx Docker, the ThingWorx Foundation version within the build.env file is used throughout the Docker image creation process. Therefore, while the archive name can be hard-coded to whatever is desired, the version should be left as is, including any additional specifications beyond just the version number. For example, the name of the archive can be given as Thingworx-Platform-H2-9.2.0.zip (a prettier version of the archive name than is used by default), but the PLATFORM_VERSION should still be set to 9.2.0-b30 (which should be how it appears within the build.env file upon download of the ThingWorx Docker files).   Paying attention to every note in the Help Center is critically important to using ThingWorx Docker, as the process is extensive and can become very complicated depending on how the image will be used. However, as long as the volumes are specified and the log files accessible, debugging any issues while bringing up a Docker-contained ThingWorx instance is fairly straightforward.     Credits: Images borrowed from ThingWorx Docker Containerization Tech Talk by Adrian Petrescu
View full tip
5 Common Mistakes to Developing Scalable IoT Applications by Tori Firewind and the IoT EDC Team Introduction To build scalable applications, it’s necessary to identify common mistakes and avoid them at the early stages of development. In an expert session this past month, the PTC Enterprise Deployment Team elaborated on why scalability is important and how to avoid the common development pitfalls in IoT. That video presentation has been adapted here for visual consumption of the content as well.   What is Scalability and Why Does it Matter Enterprise ready applications can scale and easily be maintained, which is important even from day 1 because scalability concerns are the largest cause for delays to Go Lives.  Applications balance many competing requirements, and performance testing is crucial to ensure an application is ready for Go Live. However, don't just test how many remote assets can connect at once, but also any metrics that are expected to increase in time, like the number of remote properties per thing, the frequency of reporting from those properties, or the number of users accessing the system at once. Also consider how connecting more assets will affect the user experience and business logic, and not just the ability to ingest data.   Common Mistake 1: Edge Property Updates Because ThingWorx is always listening for updates pushed from the Edge and those resources are always in use, pulling updates from the Foundation side wastes resources. Fetch from remote every read is essentially a round trip, so it's slower and more memory intensive, but there are reasons to do it, like if the quality tag is needed since the cache doesn't store it. Say a property is pushed at 11:01, and then there's a network issue at 11:02. If the property is pulled from the cache, it will pull the value sent at 11:01 without any indication of there being a more recent value on the Edge device. Most people will use the default options here: read from server cache, which relies on the Edge to push updates, and the VALUE push type, and configuring a threshold is a good idea as well. This way, only those property updates which are truly necessary are sent to the Foundation server. Details on property aspects can be found in KCS Article 252792.   This is well documented in another PTC Community post. This approach is necessary and considered a best practice if there is event logic which depends on multiple properties at once. Sending all of the necessary properties to determine if an event should fire in one Infotable ensures there is no need to query the database each time a property update comes in from the Edge, which ensures independent business logic and reduces the load on the database to improve ingestion performance. This is a very broad topic and future articles will address it more specifically. The When Disconnected property aspect is a good way to configure what happens with Edge property values in a mass disconnect scenario. If revenue depends on uptime, consider losing any data that changes while a device is disconnected. All of the updates can be folded into a single value if the changes themselves aren't needed but an updated value is needed to populate remote properties upon reconnect. Many customers will want to keep all of their data, even when a device is offline and use data stores. In this case, consider how much data each Edge device can store (due to memory limitations on the devices themselves), and therefore how long an outage can last before data is lost anyway. Also consider if Foundation can handle massive spikes in activity when this data comes streaming in. Usually, a Connection Server isn't enough. Remember that the more data needs to be kept, the greater the potential for a thundering herd scenario.   Handling a thundering herd scenario goes beyond sizing considerations. It is absolutely crucial to randomize the delay each device will wait before attempting to reconnect. It should be considered a requirement to have the devices connect slowly and "ramp up" over time for multiple reasons. One is that too much data coming in too fast could overwhelm the ingestion queue and result in data loss. Another is that the business logic could demand so many system resources, that the Foundation server crashes again and again and cannot be recovered. Turning off the business logic it isn't possible if the downtime is unexpected, so definitely rely instead on randomized reconnection times for Edge devices.   Common Mistake 2: Overlooking Differences in HA To accommodate a shared thing model across many servers, changes had to be made in how the thing model is stored and the model tree is walked by the Foundation servers. Model information is no longer cached at the Thing level, and the model tree is therefore walked every time model information is needed, so the number of times a Thing is directly referenced within each service should be limited (see the Help Center for details).   It's best to store whatever information is needed from a Thing in an Infotable, making the Things[thingName] reference a single time, outside of any loops. Storing the property definitions outside of the loop prevents the repetitious Thing references within the service, which otherwise would have occurred twice for each property (for both the name and the description), and then again for every single property on the Thing, a runtime nightmare.   Certain states previously held in memory are now shared across the cluster, like property values, Thing states, and connection statuses. Improvements have been made to minimize the effects of latency on queries, like how they now only return property values on associated Thing Shapes or Thing Templates. Filtering for properties on implementing Things is still possible, but now there is a specific service to do it, called GetThingPropertyValues (covered in detail in the Help Center).    In the script shown above, the first step is a query to get the names of all implementing things of a particular Thing Shape. This is done outside of any loops or queries, so once per service call. Then, an Infotable is built to store what would have been a direct reference to each thing in a traditional loop. This is a very quick loop that doesn't add much by way of runtime since it is all in memory, with no references to the thing model or the database, instead using the results of the first query to build the Infotable. Finally, this thing reference Infotable is passed into the new service GetThingPropertyValues to retrieve all of the property info for all of these things at once, thereby only walking the thing model once. The easiest mistake people would make here is to do a direct thing reference inside of a loop, using code like Things[thingName].Get() over and over again, thereby traversing the thing model repeatedly and adding a lot of runtime. QueryImplementingThingsOptimized is another new service with new parameters for advanced configuration. Searches can now be done on particular networks or to particular depths, and there's an offset parameter that allows for a maximum number of items to be returned starting at any place in the list of Things, where previously if you needed the Things at the end of the list, you had to return all of the Things. All of these options are detailed in the Help Center, as well as the restrictions listed in the image above.    Common Mistake 3: Async Service Misuse   Async services are sometimes required, say if a user has to trigger many updates on many remote things at once by the click of a button on a mashup that should not be locked up waiting for service completion. Too many async service calls, though, result in spikes in activity and competition for resources. To avoid this mistake, do not use async unless strictly necessary, and avoid launching too many async threads in parallel. A thread dump will show how many threads there are and what they are doing.   Common Mistake 4: Thread Pool Overload Adding more threads to the pool may be beneficial in certain circumstances, like if the threads are waiting on other resources to complete their tasks, look stuff up in the database (I/O), or unlock data that can only be accessed one thread at a time (property writes). In this case, threads are waiting on other resources, and not the CPU, so adding more threads to the pool can improve performance. However, too many threads and performance degradation will occur due to increased contention, wasted CPU cycles, and context switches.   To check if there are too many or not enough threads in the pool, take thread dumps and time the completion of requests in the system. Also watch the subsystem memory usage, and note that the side of the queue should never approach the max. Also consider monitoring the overall performance of the system (CPU and Memory) with a tool like Grafana, and remember that a good performance test properly exercises all of the business logic and induces threads in a similar way to real world expectations.   Common Mistake 5: Stream Etiquette Upserts, or updates to database tables, are expensive operations that can interfere with ingestion if they are performed on the wrong tables. This is why Value Stream and Stream data should never be updated by end users of the application. As described in the DGIS document on best practices, aggregation is the key to unlocking optimal performance because it reduces the size of database tables that require upserts. Each data structure shown here has an optimal use in a well-designed ThingWorx application.   Data Tables are great for storing overview information on all of the Things in one view, and queries on this data source are the fastest. Update this data source as often as possible (by timer), allowing enough time for updates to be gathered and any necessary calculations made. Data Tables can also be updated by end users directly because each row locks one at a time during updates. Data Tables should be kept as small as possible to improve performance on mashups, so for instance, consider using one to show all Things per region if there are millions of Things. Roll up information is best stored here to avoid calculations upon mashup load, and while a real-time view of many thousands of things at once is practically impossible, this option allows for a frequently updates overview of many things, which can also drill down to other mashup views that are real-time for one Thing at a time.   Value Streams are best used for data ingestion, and queries to these should be kept to a minimum, largely performed by the roll up logic that populates the Data Tables mentioned above. Queries that chart all of the data coming in are best utilized on individual Thing views so that only a handful of users are querying the same data sources at a time. Also be sure to use start and end dates and make use of the "source" field to improve query performance and create a better user experience. Due to the massive size of the corresponding database tables, it's best to avoid updating Value Streams outside of the data ingestion process altogether.   Streams are similar, but better for storing aggregated, historical data. Usually once per day or per week (outside of business hours if possible), Value Stream data will be smoothed or reduced into less data points and then stored into Streams. This allows for data to be stored for longer periods of time on the server without using up as much memory or hurting query performance. Then the high volume ingested data sources can be purged frequently, as discussed below.   Infotables are the most memory intensive, and are really designed to hold only a small number of rows at a time, usually to facilitate the business logic. Sometimes they will be stored in Streams or Data Tables if they aren't expected to grow larger (see the DGIS Coffee Machine App for an example). Infotables should never be logged; if they are used to transmit Edge property updates (like in the Property Set Approach), they should be processed into other logged (usually local) properties.   Referring to the properties themselves is how to get real-time information on a mashup, say by using the GetProperties service and its auto-update option, which relies on internal websockets. This should be done on individual Thing views only, and sizing considerations need to be made if there will be many of these websockets open at once, say if there are many end users all viewing real-time data at a time.   In the newer versions of ThingWorx, these cannot be updated directly, so find the system object called ThingWorxPersistenceProvider and use the service UpdateStreamDataProcessingSettings. ThingWorx Foundation processes data received from remote devices in batches in order to manage the data flow and reduce database churn. All of these settings configure how large those batches are and how frequently they are flushed to the database (detailed in full in KCS Article 240607). This is very advanced configuration that heavily depends on use case and infrastructure, but some info applies to most people: adjusting the scan rate is usually not beneficial; a healthy queue should never approach the max limit; and defaults differ by database because they function differently. InfluxDB generally works better when there are less processing threads and higher numbers of things per thread, while PostgresDB can have a lot of threads, preferably with less things per thread. That's why the default values shown here are given as the same number of threads (and this can be changed), but Influx has a larger block size and size threshold because it can handle more items per thread. Value Streams ingest all data into the Foundation server, and so the database tables that correspond with these data sources grow very large, very quickly and need to be purged often and outside of business hours, usually once a day or once per week. That's why it's important to reduce the data down to less points and push them into Streams for historical reference. For a span of years, consider a single point a day might be enough, for a span of hours, consider a data point a minute. Push aggregated data into Streams and then purge the rest as soon as it is no longer needed.   In Conclusion
View full tip
ThingWorx Performance Monitoring with Grafana authored by EDC team member Desheng Xu ( @xudesheng )   Monitoring ThingWorx performance is crucially important, both during the load testing of a newly completed application, and after the deployment of new code in an existing application. Monitoring performance ensures that everything works as expected at the Enterprise level.  This tutorial steps you through configuring and installing a tool  which runs on the same network as the ThingWorx instance. This tool collects data from the Platform and translates it into something visual and easy to understand via Grafana.    tsample is  small and customizable, and it plays a similar role to telegraf. Its focus is on gathering ThingWorx performance metrics. Historically, this tool also supported collecting OS level performance metrics, but this is no longer supported. It is highly recommended to collect OS level performance metrics by using telegraf, a tool designed specifically for that purpose (and not discussed here). This is not the only way to go about monitoring ThingWorx performance, but this tool uses a very good approach that has been proven effective both at customer sites and internally by PTC to monitor scale tests.   Find the most recent release here.   Recommended Deployment Architecture tsample can be deployed in the same box where ThingWorx Tomcat is running, but it's recommended to deploy it on a separated box to minimize any performance impact caused by the collector. tsample supports export to InfluxDB and/or local file. In this document, it is assumed that InfluxDB will be used for monitoring purpose. Please note that this is not the same instance of InfluxDB being used by ThingWorx (if configured). This article will not cover setting up InfluxDB or NGINX (if necessary), so please configure these before beginning this tutorial.   Supported Platform tsample has been tested on Windows 2016, MacOS 10.15, Ubuntu 16.04, and Redhat 7.x.  It's anticipated to work on a more general Ubuntu/Redhat/Mac/Windows release as well. Please leave a comment or contact the author, @xudesheng , if Raspberry Pi support is needed.    Configuration File Where to Store the Configuration File tsample will pick up the configuration file in the following sequence: from the command line...   ./tsample -c <path to configuration file>​     from the environment... Linux:   export TSAMPLE_CONFIG=<path to configuration file> ./tsample​   Windows:   set TSAMPLE_CONFIG=<path to configuration file> tsample.exe   from a default location... tsample will try to find a file with the name "config.toml " from the same folder in which it starts.   How to Craft a Configuration File You can use following command to generate a sample file:     ./tsample -c config.toml -e     or:     ./tsample -c config.toml --export       A file with the name "config.toml " will be generated with a sample configuration. You can then adjust its content in accordance with the following.   Configuration File Content Format Configuration file must be in toml format. title and owner sections Both sections are optional. The intention of these two sections is to support doc tool in future. TestMachine section This is section is required, and it defines where this tool will run.   thingworx_servers section This section is where you define targeted ThingWorx applications. Multiple ThingWorx servers can be defined with the same or different metrics to be collected.   thingworx_servers.metrics sections Underneath each thingworx_servers section, there are several metrics. In default example, following metrics have been included: ValueStreamProcessingSubsystem DataTableProcessingSubsystem EventProcessingSubsystem PlatformSubsystem StreamProcessingSubsystem WSCommunicationsSubsystem WSExecutionProcessingSubsystem TunnelSubsystem AlertProcessingSubsystem FederationSubsystem You can add your own customized metrics, as long as the result follows the same Data Shape. The default Data Shape has 3 columns: If the output Data Shape exceeds this limit, the tool will likely not work properly.   result_export_to_db section This section defines the target InfluxDB as a sink of collected performance metrics.   result_export_to_file section This section defines the target file storage for collected performance metrics.   Grafana Configuration Example Monitor Value Stream Step 1. Connect Grafana to InfluxDB   Step 2: Create a New Dashboard   Step 3. Create a New Query Depending on which metrics you defined to collect in the tsample configuration file, you will see a different choice of measurement in Grafana. Here, we will use ValueStreamProcessingSubsystem as an example.   Step 4. Choose the Right Platform and Storage Provider Some metrics depend on the database storage provider, like Value Stream and Stream.   Step 5. Choose the Metrics Figures   Select "remove" to get rid of the default 'mean' calculation. Select "non_negative_difference" from Transformations. Using this transformation, Grafana can show us the speed of writes.     Then, remove the default GROUP BY "time" clause. Assign a meaningful alias of this query.   Step 6. Add Another Query You can add another query as 'Value Stream Queued Speed' by following the same steps.   Step 7. Assign a Panel Title   Step 8. Review the Result Let's go back to the dashboard page and select "last 15 minutes" or "last 5 minutes" from the top right corner. It should show a result similar to the chart below.   Step 9. Save the Dashboard Don't forget to save your dashboard before we add more panels.   Step 10. Refine the Panel It's difficult to figure out the high-level write speed from the above panel, so let's enhance it. Add a new query with the following configuration: In the above query, there are two additional figures: 20s and 1m... How do you choose? 20s should be the same as sampling_cycle_inseconds in your tsample configuration file. If you choose a different value, then you could end up with misleading results. Larger values such as "1m" may give you a smoother result, but they could also hide system instability. Going larger than 1m is not recommended in most cases. With this new query, it's much easier to figure out what the average write speed in current testing is.   Tips: if your sampling_Cycle_inseconds is 30s, then you may not need this additional query. The following image is a sample at the 30s interval time. You would not need an additional average query to get a smooth write speed.   The next example is a sample at the 10s interval time. Without additional queries, you may not be able to get a meaningful understanding of the write speed. From the above three examples, it's recommended to configure the sampling interval time at 30s, or anything larger than 20s. You can then choose whether you need additional queries based on the visualization result.   Step 11. Further Refinement The above charts illustrate the queuing and writing speed. However, it is possible that the Value Stream may perform at a reasonable speed, but the Value Stream queue may be growing and could exceed its capacity. Let's add another query to monitor this: However, it is difficult to read this chart, since it has a different value range on the y-axis: Let's move this query to a second y-axis on the right: This will make the view much easier to see: The current queue size or remaining queue size will always move up and down; it is healthy as long as it does not continue to grow to a high level.   What Else Can Be Monitored? The following metrics would be monitored by most customers: Value Stream write and queue speed Value Stream queue size Stream write and queue speed Stream queue size Event performed speed (completedTaskCount) Event submitted speed (submittedTaskCount) Event queue size Websocket communication Websocket connection   ThingWorx Memory Usage Monitoring Create a new panel and add a new query: In a running system, memory usage will always move up and down - at times sharply or quickly - when the system is busy. The system is healthy as long as memory doesn't go up continuously or stay at a maximum for a long period of time.   Conclusion Setting up monitoring is absolutely crucial to managing the performance of an enterprise ThingWorx application. Using Grafana makes tracking and visualizing the performance much easier. Stay tuned to the EDC tag for more monitoring tips to come!
View full tip
Remote Monitoring of Assets Benchmark   As @ttielebein introduced previously, one of the missions of the IOT Enterprise Deployment Center (EDC) is to publish benchmarks that showcase the ThingWorx Platform deployed to solve real-world IOT business problems.    Our goal is that these benchmarks can be used as a reference or baseline for architects working on their own implementations... showing not only a successful at-scale implementation, but also what happens when that same implementation is pushed to ...or even past... it's limits.   Please find the first installment attached - a reference benchmark demonstrating ThingWorx deployed to monitor 15,000 assets with a high-volume of data properties per asset.  Over 250 hours of simulations were conducted as part of producing this benchmark.   The IOT EDC team will be monitoring this post (as well as our other posts in the IOT Tech Tips forum) to answer any questions we can about the approaches taken in designing, deploying and simulating this implementation.    As the team will publish more benchmarks like this will be published in the future, we also greatly value any feedback you have that can help us to improve the content for future documents.
View full tip
The Property Set Approach This article details an approach developed by Prachi Rath and Roy Clarke, refined by the EDC team in the December 2019's Remote Monitoring of Assets Reference Benchmark , and used to handle multi-property business rules in an Enterprise ThingWorx application.   Introduction If there are logic rules which depend upon multiple properties, and each property receives its updates one at a time, then each property will need to have an identical subscription, because there is no way for any one subscription to know the most up-to-date values for the other properties. This inefficient approach would create redundancy and sizing constraints, reducing the capacity of the application to scale up to the Enterprise level. The Property Set Approach resolves this issue by sending in all property updates as one Info Table or JSON property (called the “property set”), which can then have a single subscription. The property set is assembled on the Edge when an update needs to be sent, and then the Platform dissects, processes, and stores the data within this property set as required by the business logic.   This approach also involves caching the last property value into a runtime variable so that it can be referenced within the business logic subscription without having to be retrieved from the database. This can significantly improve the runtime of the subscription, reducing the number of resources required to sustain the business logic and ensuring that any alerts or events resulting from the business logic occur as soon as possible. It also reduces the load on the database, ensuring that data ingestion can complete unhindered.   So, while there are many benefits to this approach, it is also more complicated. It tightly couples the development of the Edge and Platform code and increases the application complexity, making it slightly less easy to maintain the application in the long run. The property set also requires a little more bandwidth and a more stable internet connection between Edge devices and the Platform since there is more metadata in an Info Table property, and therefore every update is slightly larger than it would be otherwise. So this approach is only recommended when multi-property rules are a requirement of the application and a stable internet connection exists between the Edge and Platform.   Platform Implementation I. Create an Info Table (or JSON) Property This tutorial uses the out-of-the-box Data Shape called NamedVTQ for the Info Table property, which is defined on a Thing Template as a remote property. It is important that this is not marked as persistent or logged, as the purpose is to reduce the amount of database writes and reads required by the Platform. The Info Table property has the following property definition:     <PropertyDefinition aspect.dataChangeType="ALWAYS" aspect.dataShape="NamedVTQ" aspect.isPersistent="false" baseType="INFOTABLE" isLocalOnly="false" name="numberPropertySetAsInfotable"/>       II. Create a Data Change Event Subscription for the Info Table Property The subscription has three parts: Cache the last value for the property in a runtime variable Start off the business rules processing, sending in the whole Info Table Send the Info Table to be logged as individual local properties in the database     // First step caches the last Value, refer to the next step… // Second step sets off the business rules processing with the Info Table me.ScaleTestBusinessRuleForPropertySetAsInfotable({ PropertySetAsInfotable: eventData.newValue.value }); // Third step sends the Info Table as one property into a service which parses it into the // individual properties, updating both the runtime properties on the remote thing and the database me.UpdatePropertyValues({ values: eventData.newValue.value /* INFOTABLE */ });       III. Set-Up Caching Each property which needs to be cached should be created on the Thing Template level and named in a similar way, say by placing the word “Last” at the end, such as “Property1” => “Property1Last”, “Property2” => “Property2Last”, etc. This property should NOT be logged or persistent, as the point of this is to store the most recent value in memory, removing any superfluous dependency on database queries in the process. Note that while storing the property in runtime memory makes it much more accessible, it also means that the property needs to be rewritten manually upon Platform restart. Additional code (not provided here) must be written to populate these properties from the database upon application start-up.   The following code should be placed in the data change event subscription (option 1 in the case where only a few properties need caching, or option 2 if every property value needs to be cached):   Option 1: Some but Not All Properties Need Caching     // Names of properties for which you want to cache the last value var propertyNames = ['number1', 'number2']; // Loop through the properties and cache their time if they are found in the property set propertyNames.map(assignLast); // This function can be split into two functions for Age and Last separately if need be function assignLast(propertyName) { logger.debug("Looping for property -> "+ propertyName); var searchprop = new Object(); searchprop.name = propertyName; property = eventData.newValue.value.Find(searchprop); if(property){ logger.debug("Found Row. Name= " + property.name); var lastPropertyName = propertyName+"Last"; if(property.value) { // Set the cache property on me, this entity, to the current property value me[lastPropertyName] = me[propertyName]; } } else { logger.debug("Property Not Found in property set -> " + propertyName); } }       Option 2: All Properties Need Caching     var rowCount = eventData.newValue.value.getRowCount(); for(var i=0; i<rowCount; i++){ logger.warn("property name->" + eventData.newValue.value[i].name + "----- property new value->" + eventData.newValue.value[i].value.value); var propertyName = eventData.newValue.value[i].name; var lastPropertyName = propertyName+"Last"; me[lastPropertyName] = me[propertyName]; logger.warn("done last subscription, last property value for lastPropertyName" + me[lastPropertyName]); }         Useful Platform Code Snippets I. Age Calculation     var date1 = new Date(); var date2 = me.GetPropertyTime({ propertyName: propertyName /* STRING */ }); var result = millisToMinutesAndSeconds (dateDifference(date1, date2) ); // This function converts from an unintelligibly large number in milliseconds to something formatted in minutes and seconds function millisToMinutesAndSeconds(millis) { var minutes = Math.floor(millis / 60000); var seconds = ((millis % 60000) / 1000).toFixed(0); return (seconds == 60 ? (minutes+1) + ":00" : minutes + ":" + (seconds < 10 ? "0" : "") + seconds); }       II. Sort the Info Table by Time     var params = { sortColumn: "time" /* STRING */, t: me.propertySet/* INFOTABLE */, ascending: ascending /* BOOLEAN */ }; var result = Resources["InfoTableFunctions"].Sort(params);       III. Search the Info Table for a Property     var searchprop = new Object(); searchprop.name = propertyName; property = PropertySetAsInfotable.Find(searchprop); if(property === null){ logger.info("Property Not Found -> " + propertyNumber1); } else { logger.info("Found Row. Name= [" + property.name + "], value= " + property.value.value); }         Edge Implementation This example implementation uses the .NET Edge SDK to build a property set Info Table at the Edge.   I. Define the Data Shape A standard Data Shape is used (NamedVTQ), but because this Data Shape is not exposed in the Edge SDK code, it has to be created manually.     // Data Shape definition for NamedVTQ FieldDefinitionCollection namedVTQFields = new FieldDefinitionCollection(); namedVTQFields.addFieldDefinition(new FieldDefinition(CommonPropertyNames.PROP_NAME, BaseTypes.STRING)); namedVTQFields.addFieldDefinition(new FieldDefinition(CommonPropertyNames.PROP_VALUE, BaseTypes.VARIANT)); namedVTQFields.addFieldDefinition(new FieldDefinition(CommonPropertyNames.PROP_TIME, BaseTypes.DATETIME)); namedVTQFields.addFieldDefinition(new FieldDefinition(CommonPropertyNames.PROP_QUALITY, BaseTypes.STRING)); base.defineDataShapeDefinition("NamedVTQ", namedVTQFields);     II. Define the Info Table Property The property defined should NOT be logged or persistent, and it can be read-only, since data is always pushed from the Edge and read from the server cache when accessed on the Platform. Note that the push type of the info table property MUST be set to "ALWAYS" (if set to "VALUE", the data change event will only fire if the number of rows changes).   // Property Set Definitions [ThingworxPropertyDefinition( name = "DevicePropertySet", description = "Alternative representation of properties as an Info Table for rules processing", baseType = "INFOTABLE", category = "Status", aspects = new string[] { "isReadOnly:true", "isPersistent:false", "isLogged:false", "dataShape:NamedVTQ", "cacheTime:0", "pushType:ALWAYS" } ) ]     III. Define a Property to Store the GOOD Quality Status   private static String QUALITY_STATUS_GOOD = QualityStatus.GOOD.name();     IV. Define Functions to Populate the Value Collections  An Info Table is really just made up of many Value Collections, where each Value Collection is considered a row. These services take in the name and value of a property and return a Value Collection object which can be added to the property set Info Table.   public ValueCollection createNumberValueCollection(String name, double value) { ValueCollection vc = new ValueCollection(); // Add quality and time entries to the Value Collection vc.SetStringValue(CommonPropertyNames.PROP_QUALITY, QUALITY_STATUS_GOOD); vc.SetDateTimeValue(CommonPropertyNames.PROP_TIME, new DatetimePrimitive(DateTime.UtcNow)); vc.SetStringValue(CommonPropertyNames.PROP_NAME, name); vc.SetNumberValue(CommonPropertyNames.PROP_VALUE, value); return vc; } public ValueCollection createBooleanValueCollection(String name, Boolean value) { ValueCollection vc = new ValueCollection(); // Add quality and time entries to the Value Collection vc.SetStringValue(CommonPropertyNames.PROP_QUALITY, QUALITY_STATUS_GOOD); vc.SetDateTimeValue(CommonPropertyNames.PROP_TIME, new DatetimePrimitive(DateTime.UtcNow)); vc.SetStringValue(CommonPropertyNames.PROP_NAME, name); vc.SetBooleanValue(CommonPropertyNames.PROP_VALUE, value); return vc; }     V. Build the Property Set Call this code from the processScanRequest method to build the property set.   // Create an instance of a new Info Table using the standard "NamedVTQ" Data Shape InfoTable propertySet = new InfoTable(getDataShapeDefinition("NamedVTQ")); // Set name/value for Temperature using convenience function propertySet.addRow(createNumberValueCollection("Temperature", temperature)); // Set name/value for Pressure using convenience function propertySet.addRow(createNumberValueCollection("Pressure", pressure)); // Set name/value for TotalFlow using convenience function propertySet.addRow(createNumberValueCollection("TotalFlow", this._totalFlow)); // Set name/value for InletValve using convenience function propertySet.addRow(createBooleanValueCollection("InletValve", inletValveStatus)); // Set name/value for FaultStatus using convenience function propertySet.addRow(createBooleanValueCollection("FaultStatus", faultStatus)); // Set the property set Info Table property base.setProperty("DevicePropertySet", propertySet);     VI. Update the subscribed properties These two lines of code update the properties and events, actually sending the property set (containing all property updates) to the Platform.   base.updateSubscribedProperties(15000); base.updateSubscribedEvents(60000);     Conclusion Following these steps will enable the Edge to build a property set before sending any property updates to the Platform. The Platform can then rely on caching to process the business logic with no database dependency, which is faster and more efficient than any other approach. Finally the updates are still written to the database, so in the end, there is no functional difference between using a property set and binding each property individually. Please don't hesitate to comment here with any questions about this approach.
View full tip
By Tim Atwood and Dave Bernbeck, Edited by Tori Firewind Adapted from the March 2021 Expert Session Produced by the IoT Enterprise Deployment Center The primary purpose of monitoring is to determine when your application may be exhausting the available resources. Knowledge of the infrastructure limits help establish these monitoring boundaries, determining straightforward thresholds that indicate an app has gone too far. The four main areas to monitor in this way are CPU, Memory, Networking, and Disk.   For the CPU, we want to know how many cores are available to the application and potentially what the temperature is for each or other indicators of overtaxation. For Memory, we want to know how much RAM is available for the application. For Networking, we want to know the network throughput, the available bandwidth, and how capable the network cards are in general. For Disk, we keep track of the read and write rates of the disks used by the application as well as how much space remains on those.   There are several major infrastructure categories which reflect common modes of operation for ThingWorx applications. One is Bare Metal, which relies upon the traditional use of hardware to connect directly between operating system and hardware, with no intermediary. Limits of the hardware in this case can be found in manufacturing specifications, within the operating system settings, and listed somewhere within the IT department normally. The IT team is a great resource for obtaining these limits in general, also keeping track of such things in VMware and virtualized infrastructure models.   VMware is an intermediary between the operating system and the hardware, and often its limits are determined based on the sizing of the application and set by the IT team when the infrastructure is established. These can often be resized as needed, and the IT team will be well aware of the limits here, often monitoring some of the performance themselves already. This is especially so if Cloud Providers are used, given that these are scaled up virtualizations which are configured in easy-to-use cloud portals. These two infrastructure models can also be resized as needed.   Lastly Containers can be used to designate operating system resources as needed, in a much more specific way that better supports the sharing of resources across multiple systems. Here the limits are defined in configuration files or charts that define the container.   The difficulties here center around learning what the limits are, especially in the case of network and disk usage. Network bandwidth can fluctuate, and increased latency and network congestion can occur at random times for seemingly no reason. Most monitoring scenarios can therefore make due with collecting network send and receive rates, as well as disk read and write rates, performed on the server.   Cloud Providers like Azure provide VM and disk sizing options that allow you to select exactly what you need, but for network throughput or network IO, the choices are not as varied. Network IO tends to increase with the size of the VM, proportional to the number of CPU cores and the amount of Memory, so this may mean that a VM has to be oversized for the user load, for the bulk of the application, in order to accommodate a large or noisy edge fleet. The next few slides list the operating metrics and common thresholds used for each. We often use these thresholds in our own simulations here at PTC, but note that each use case is different, and each situation should be analyzed individually before determining set limits of performance.   Generally, you will want to monitor: % utilization of all CPU cores, leaving plenty of room for spikes in  activity; total and used memory, ensuring total memory remains constant throughout and used memory remains below a reasonable percentage of the total, which for smaller systems (16 GB and lower) means leaving around 20% Memory for the OS, and for larger systems, usually around 3-4 GB.    For disks, the read and write rates to ensure there is ample free space for spikes and to avoid any situation that might result in system down time;  and for networking, the send and receive rates which should be below 70% or so, again to leave room for spikes.   In any monitoring situation, high consistent utilization  should trigger concern and an investigation into  what’s happening. Were new assets added? Has any recent change caused regression or other issues?    Any resent changes should be inspected and the infrastructure sizing should be considered as well. For ThingWorx specific monitoring, we look at max queue sizes, entries performed, pool sizes, alerts, submitted task counts, and anything that might indicate some kind of data loss. We want the queues to be consistently cleared out to reduce the risk of losing data in the case of an interruption, and to ensure there is no reason for resource use to build up and cause issues over time. In order for a monitoring set-up to be truly helpful, it needs to make certain information easily accessible to administrative users of the application. Any metrics that are applicable to performance needs to be processed and recorded in a location that can be accessed quickly and easily from wherever the admins are. They should quickly and easily know the health of the application from a glance, without needing to drill down a lot to be made aware of issues. Likewise, the alerts that happen should be  meaningful, with minimal false alarms, and it is best if this is configurable by the admins from within the application via some sort of rules engine (see the DGIS guide, soon to be released in version 9.1). The  monitoring tool should also be able to save the system history and export it for further analysis, all in the name of reducing future downtime and creating a stable, enterprise system.     This dashboard (above) is a good example of how to  rollup a number of performance criteria into health indicators for various aspects of the application. Here there is a Green-Yellow-Red color-coding system for issues like web requests taking longer than 30s, 3 minutes, or more to respond.   Grafana is another application used for monitoring internally by our team. The easy dashboard creation feature and built-in chart modes make this tool  super easy to get started with, and certainly easy to refer to from a central location over time. Setting this up is helpful for load testing and making ready an application, but it is also beneficial for continued monitoring post-go-live, and hence why it is a worthy investment. Our team usually builds a link based on the start and end time of tests for each simulation performed, with all of the various servers being monitored by one Grafana server, one reference point.   Consider using PTC Performance Advisor to help monitor these kinds of things more easily (also called DynaTrace). When most administrators think of monitoring, they think of reading and reacting to dashboards, alerts, and reports. Rarely does the idea of benchmarking come to mind as a monitoring activity, and yet, having successful benchmarks of system performance can be a crucial part of knowing if an application is functioning as expected before there are major issues. Benchmarks also look at the response time of the server and can better enable  tracking of actual end user experience. The best  option is to automate such tests using JMeter or other applications, producing a daily snapshot of user performance that can anticipate future issues and create a more reliable experience for end users over time.   Another tool to make use of is JMeter, which has the option to build custom reports. JMeter is good for simulating the user load, which often makes up most of the server load of a ThingWorx application, especially considering that ingestion is typically optimized independently and given the most thought. The most unexpected issues tend to pop up within the application itself, after the project has gone live.   Shown here (right) is an example benchmark from a Windchill application, one which is published by PTC to facilitate comparison between optimized test systems and real life performance. Likewise, DynaTrace is depicted here, showing an automated baseline (using Smart URL Detection) on Response Time (Median and 90th percentile) as well as Failure Rate. We can also look at Throughput and compare it with the expected value range based on historical throughput data. Monitoring typically increases system performance  and availability, but its other advantage is to provide faster, more effective troubleshooting. Establish a systematic process or checklist to step through when problems occur, something that is organized to be done quickly, but still takes the time to find and fix the underlying problems. This will prevent issues from happening again and again and polish the system periodically as problems occur, so that the stability and integrity of the system only improves over time. Push for real solutions if possible, not band-aids, even if more downtime is needed up front; it is always better to have planned downtime up front than unplanned downtime down the line. Close any monitoring gaps when issues do occur, which is the valid RCA response if not enough information was captured to actually diagnose or resolve the issue.   PTC Tech Support developed a diagnostic data gathering query for Oracle that customers can use, found in our knowledgebase. This is an example of RCA troubleshooting that looks at different database factors, reporting on which queries perform the worst  based on inputted criteria. Another example of troubleshooting is for the Java JVM, where we look at all of the things listed here (below) in an automated, documented process that then generates a report for easy end user consumption.   Don’t hesitate to reach out to PTC Technical Support in advance to go over your RCA processes, to review benchmark discrepancies between what PTC publishes and what your real-life systems show, and to ensure your monitoring is adequate to maintain system stability and availability at all times.  
View full tip
Thread Safe Coding, Part 1: The Java Extension Approach Written by Desheng Xu and edited by @vtielebein    Overview Time and again, customers report that one of their favorite ThingWorx features is using However, the Javascript language doesn't have a built-in semaphore locker mechanism, nothing to enable thread-safe concurrent processing, like you find in the Java language. This article demonstrates why thread safe coding is necessary and how to use the Java Extension for this purpose. Part 2 presents an alternative approach using database lockers.   Demo Use Case Let's use a highly abstracted use case to demo thread-safe code practices: There are tens of machines in a factory, and PLC will emit a signal to indicate an issue happens during run-time. The customer expects to have a dashboard that shows today's total count of issues from all machines in real-time. The customer is also expecting that a timestamp of each issue can be logged (regardless of the machine). Similar use cases might be to: Show the total product counts from each sub-line in the current shift. Show the total rentals of bicycles from all remote sites. Show the total issues of distant cash machines across the country.   Modeling Thing: DashboardCounter, which includes: 1 Property: name:counter, type:integer, logged:true, default value:0 3 services: IncreaseCounter(): increase counter value 1 GetCounter(): return current counter value ResetCounter(): set counter value to 0 1 Subscription: a subscription to the data change event of the property counter, which will print the new value and timestamp to the log.   GetCounter var result = me.counter;   IncreaseCounter me.counter = me.counter + 1; var result = me.counter;   ResetCounter me.counter = 0; var result = 0;   Subscription MonitorCounter Logger.info(eventData.newValue.value+":"+eventData.newValue.time.getTime());   ValueStream For simplicity, the value stream entity is not included in the attachment. Please go ahead and assign a value stream to this Thing to monitor the property values.   Test Tool A small test tool mulreqs is attached here, along with some extensions and ThingWorx entities that are useful. The mulreqs tool uses a configuration file from the OS variable definition MULTI_REQUEST_CONFIG.   In Linux/MacOS: export MULTI_REQUEST_CONFIG="./config.json" in config.json file, you can use the following configuration:       { "host":"twx85.desheng.io", "port":443, "protocol":"https", "endpoint":"/Thingworx/Things/DashboardCounter/services/IncreaseCounter", "headers":{ "Content-Type":"application/json", "Accept": "application/json", "AppKey":"5cafe6eb-adba-41df-a7d6-4fc8088125c1" }, "payload":{}, "round_break":50000, "req_break":0, "round_size":50, "total_round":20 }       host, port, protocol, headers are very identical to define a ThingWorx server. endpoint defines which service is called during the test. payload is not in use at this moment but you have to keep it here. total_round is how many rounds of the test you want to run. round_size defines how many requests will be sent simultaneously during each round. round_break is the pause time during each round in Microseconds, so 50000 in the above example means 50ms. req_break is 0, this is the delay between requests. "0" means requests to the server will happen simultaneously.   The expectation from the above configuration is service execution a total of 20*50 times, 1000 times. So, we can expect that if the initial value is 0, then counter should be 1000 at the end, and if the value stream is clean initially, then the value stream should have a history from 1 to 1000.   Run Test Use the following command to perform the test: .<your path>/mulreqs Execution output will look like:   Check Result You will be surprised that the final value is 926 instead of 1000. (Caution: this value will be different in different tests and it can be any value in the range of 1 and 1000). Now, look at the value stream by using QueryPropertyHistory. There are many values missing here, and while the total count can vary in different tests, it is unlikely to be exactly the last value (926). Notice that the last 5 values are: 926, 925, 921, 918. The values 919, 920, 922, and 923 are all missing. So next we check if there are any errors in the script log, and there are none. There are only print statements we deliberately placed in the logs. So, we have observed two symptoms here: The final value from property counter doesn't have the expected value. The value stream doesn't have the expected history of the counter property changes. What's the reason behind each symptom, and which one is a thread-safe issue?   Understanding Timestamp Granularity ThingWorx facilitates the collection of time series data and solutions centered around such data by allowing for use of the timestamp as the primary key. However, a timestamp will always have a minimal granularity definition when you process it. In ThingWorx, the minimal granularity or unit of a timestamp is one millisecond.   Looking at the log we generated from the subscription again, we see that several data points (922, 923, 924, 925) have the same timestamp (1596089891147), which is GMT Thursday, July 30, 2020, 6:18:11.147 AM. When each of these data points is flushed into the database, the later data points overwrite the earlier ones since they all have the same timestamp. So, data point 922 went into the value stream first, and then was overwritten by data point 923, and then 924, and then 925. The next data point in the value stream is 926, which has a new timestamp (1596089891148), 1ms behind the previous one. Therefore, data points 925 and 926 are stored while 922, 923, 924 are not. These lost data points are therefore NOT a thread-safe issue.   The reason why some of these data points have the same timestamp in this example is because multiple machines write to the same value stream. The right approach is to log data points at the individual machine level, with a different value stream per machine.   However, what happens if one machine emits data too frequently? If data points from the same machine still have a timestamp clash issue, then the signal frequency is too high. The recommended approach would be to down-sample the update frequency, as any frequency higher than 1000Hz will result in unexpected results like these.   Real Thread Safe Issue from Demo Use Case The final value of the counter being an arbitrary random number is the real thread-safe coding issue. if we take a look at the code again: me.counter = me.counter + 1; This piece of code can be split into three-piece: Step 1: read current value of me.counter Step 2: increase this value Step 3: set me.counter with new value. In a multi-threaded environment, not performing the above three steps as a single operation will lead to a race issue. The way to solve this issue is to use a locking mechanism to serialize access to the property, which will acquire the lock, perform the three operations, and then release the lock. This can be done using either the Java Extension or the database thing to leverage the database lock mechanism.   Use Java Extension to Handle Thread Safe Challenge This tutorial assumes that the Eclipse plug-in for ThingWorx extension development is already installed. The following will guide you through creating a simple Java extension step by step: Create a Java Extension Project Choose the minimal ThingWorx version to support and select the corresponding SDK. Let's name it JavaExtLocker, though it’s best to use lower-case in the project name. Add a ThingWorx Template in the src Folder Right-click the src folder and a a Thing Template. Add a Thing property Right click on the Java source file created in the above step and click the menu option called Thingworx Source, then select Add Property. Add Three Services: IncreaseCounter, GetCounter, ResetCounter Right click the Java source file and select the menu option called Thingworx source, then select Add Service. See above for the IncreaseCounter service details. Repeat these same steps to add GetCounter and ResetCounter: (Optionally) Add a Generated Serial ID Add Code to the Three Services @SuppressWarnings("deprecation") @ThingworxServiceDefinition(name = "IncreaseCounter", description = "", category = "", isAllowOverride = false, aspects = {"isAsync:false" }) @ThingworxServiceResult(name = "Result", description = "", baseType = "INTEGER", aspects = {}) public synchronized Integer IncreaseCounter() throws Exception { _logger.trace("Entering Service: IncreaseCounter"); int current_value = ((IntegerPrimitive (this.getPropertyValue("Counter"))).getValue(); current_value ++; this.setPropertyValue("Counter", new IntegerPrimitive(current_value)); _logger.trace("Exiting Service: IncreaseCounter"); return current_value; } @ThingworxServiceDefinition(name = "GetCounter", description = "", category = "", isAllowOverride = false, aspects = {"isAsync:false" }) @ThingworxServiceResult(name = "Result", description = "", baseType = "INTEGER", aspects = {}) public synchronized Integer GetCounter() throws Exception { _logger.trace("Entering Service: GetCounter"); int current_value = ((IntegerPrimitive)(this.getPropertyValue("Counter"))).getValue(); _logger.trace("Exiting Service: GetCounter"); return current_value; } @SuppressWarnings("deprecation") @ThingworxServiceDefinition(name = "ResetCounter", description = "", category = "", isAllowOverride = false, aspects = {"isAsync:false" }) @ThingworxServiceResult(name = "Result", description = "", baseType = "INTEGER", aspects = {}) public synchronized Integer ResetCounter() throws Exception { _logger.trace("Entering Service: ResetCounter"); this.setPropertyValue("Counter", new IntegerPrimitive(0)); _logger.trace("Exiting Service: ResetCounter"); return 0; }​ The key here is the synchronized modifier, which is what allows for Java to control the multi-threading to prevent data loss. Build the Application Use 'gradle build' to generate a build of the extension. Import the Extension into ThingWorx Create Thing Based on New Thing Template Check the New Thing Property and Service Definition Use the Same Test Tool to Run the Test Again { "host":"twx85.desheng.io", "port":443, "protocol":"https", "endpoint":"/Thingworx/Things/DeoLockerThing/services/IncreaseCounter", "headers":{ "Content-Type":"application/json", "Accept": "application/json", "AppKey":"5cafe6eb-adba-41df-a7d6-4fc8088125c1" }, "payload":{}, "round_break":50000, "req_break":0, "round_size":50, "total_round":20 } ​ Just change the endpoint to point to the new thing.  Check the Test Result Repeat the same test several times to ensure the results are consistent and expected (and don't forget to reset the counter between tests). Summary of Java Extension Approach The Java extension approach shown here uses the synchronized keyword to thread-safe the operation of several actions. Other options are to use a ReentryLock or Semaphore locker for the same purpose, but the synchronized keyword approach is much cleaner.   However, the Java extension locker will NOT work in 9.0 horizontal architecture since Java doesn't a have distributed locker. IgniteLocker wouldn't work in the current horizontal architecture, either. So if using a thread-safe counter in version 9.0+ horizontal architecture, then leverage the database thing, as discussion below.
View full tip
When to Include InfluxDB in the ThingWorx Development Lifecycle (this article is also available for download as a PDF attached)   The Short Answer InfluxDB is a time series database designed specifically for data ingestion. Historically, InfluxDB has been viewed as a high-scale expansion option for ThingWorx: a way to ensure the application works as intended, even when scaled up to the enterprise level. This is certainly one way to view it, because when there are many, many remote things, each with a lot of properties writing to the Platform at short intervals, then InfluxDB is a sure choice. However, what about in smaller applications? Is there still a benefit to using an optimized data ingestion tool in any case? The short answer is: yes, there is!   Using InfluxDB for optimized data ingestion is a good idea even in smaller-sized applications, especially if there are plans to scale the application up in the future. It is far better to design the application around InfluxDB from the start than to adjust the data model of the application later on when an optimized data ingestion process is required. PostgreSQL and InfluxDB simply handle the storage of data in different ways, with the former functioning better with many Value Streams, and the latter with fewer Value Streams. Switching the way data is retained and referenced later, when the application is already on the larger side, causes delays in growing the application larger and adding more devices. Likewise, if the Platform reaches its ingestion limits in a production environment, there can be costly downtime and data loss while a proper solution (which likely involves reworking the application to work optimally with InfluxDB) is implemented.   Don’t think that InfluxDB is for expansion only; it is an optimized ingestion database that has benefits at every level of the application development lifecycle. From the end to end, InfluxDB can ensure reliable data ingestion, reduced risk of data loss, and reduced memory and CPU used by the deployment overall. Preliminary sizing and benchmark data is provided in this article to explain these recommendations. Consider how ThingWorx is ingesting data now, how much CPU and other resources are used just for acquiring the data, and perhaps InfluxDB would seem a benefit to improve application performance.   The Long Answer In order to uncover just how beneficial InfluxDB can be in any size application, the IoT Enterprise Deployment Center has run some simulations with small and medium sized applications. The use case in the simulation is simple with user requests coming from a collection of basic mashups and data ingestion coming from various numbers of things, each with a collection of “fast” and “slow” properties which update at different rates. This synthetic load of data does not include a more complete application scenario, so the memory and CPU usage shown here should not be used as sizing recommendations. For those types of recommendations, stay tuned for the soon-to-release ThingWorx 9.0 Sizing (or check out the current 8.5 Sizing Guide).   Comparing Runs When determining the health of the ThingWorx Platform, there are several categories to inspect: Value Stream Queue Rate and Queue Size, HTTP Requests, and the overall Memory and CPU use for each server. Using Grafana to store the metrics results in charts like those below which can easily be compared and contrasted, and used to evaluate which hardware configuration results in the best performance. The size of the numbers on the vertical axis indicate total numbers of resources used for that metric, while the slope or trend of each chart indicates bottlenecks and inadequate resource allocation for the use case.   In this case, all darker charts represent data from PostgreSQL ONLY configurations, while the lighter charts represent the InfluxDB instances. Because this is not a sizing guide, whether each of these charts comes from the small or medium run is unimportant as long as they match (for valid comparisons between with Influx and without it). The smaller run had something like 20k Things, and the larger closer to 60k, both with 275 total Platform users (25 Admins) and 3 mashups, which were each called at various refresh rates over the course of the 1-hour testing period. Note that in the PostgreSQL ONLY instances, there were more Thing Templates and corresponding Value Streams. This change is necessary between runs because only with fewer Value Streams does InfluxDB begin to demonstrate notable improvements.   The most important thing to note is that the lighter charts clearly demonstrate better performance for both size runs. Each section below will break down what the improvement looks like in the charts to show how to use Grafana to verify the best performance.   Value Stream Queue The vertical axis on the Value Stream Queue Rate chart shows how many total writes per minute (WPS) the Platform can handle. The average is 10 WPS higher using InfluxDB in both scenarios, and InfluxDB is also much more stable, meaning that the writes happen more reliably. The Value Stream Queue Size chart demonstrates how well the writes within the queue are processed. Both of these are necessary to determine the health of data ingestion.   If the queue size were to increase and trend upward in the lighter Queue Size chart, then that would mean the Platform couldn’t handle the higher ingestion rate. However, since the Queue Size is stable and close to 0 the entire time, it is clear that the Platform is capable of clearing out the Value Stream Queue immediately and reliably throughout the entire test. FIGURE 1 – THESE REPRESENT THE DATA GETTING STORED INTO THE DATA PROVIDER. NOTE: THE FORMER IS MUCH LOWER THAN THE LATTER.   FIGURE 2 – NOTE THE DATA LOSS IN THE NON-INFLUX INSTANCE (THE QUEUE IN GREEN REACHES THE MAX IN YELLOW). THE INFLUX INSTANCE HAS LESS TROUBLE CLEARING OUT THE QUEUE, AS DEMONSTRATED BY THE CONSISTENTLY LOW QUEUE SIZE.   HTTP Requests Taking the strain of ingestion off of the Platform’s primary database frees its resources up for other activities. This in turn improves the performance and reliability of the Platform to respond to HTTP requests, those which in a typical application are used to aggregate data into smaller data stores (depending on the use case) and which render the mashups for the end users. The business logic and mashups can be more complex when there is one database designated for ingestion (InfluxDB) and one for everything else (PostgreSQL). FIGURE 3 – THE DARKER CHART SHOWS A LOT OF CHOPPINESS, MEANING THAT WHILE THE PLATFORM WAS RESPONDING THE WHOLE TIME, IT WAS NOT DOING SO RELIABLY. THE SMOOTHER SECOND CHART SHOWS HOW MUCH EASIER THE PLATFORM CAN HANDLE THESE REQUESTS WHEN THE LOAD IS DISTRUBITED INTELLIGENTLY ACROSS MULTIPLE SERVERS, EACH OPTIMIZED FOR THE TYPE OF DATA THEY RECEIVE. THE “STAIRCASE” SHAPE OCCURS BECAUSE THE SIMULATOR INCREASES THE WORK LOAD EVERY 10 MINUTES UNTIL IT BREAKS.   Likewise, the nature of Postgres lends well towards this differentiation, given that there are many more database tables required for supporting the HTTP requests, something Postgres does well. That leaves Influx to handle the time-series data and ingestion, and those are the primary strengths of that software as well. So, splitting the load across multiple servers in this way results in smaller server sizes overall, each which is stream-lined and optimized to handle exactly what it is given by the Platform.   Note that in both of these charts, there are no bad requests, so both would seem to be successful runs. However, as future charts will demonstrate more clearly, there is a catastrophic failure when the load is increased around 12:30p. The simulation ends before the server begins to show any real symptoms of the issue, and that is why there are no bad requests. The maximum Operations Per Second (OPS) in the Hardware Specifications and Performance section is taken from before the failure begins.   Clearly the InfluxDB instance has better performance given that the average Operations Per Second (OPS) is substantially higher, nearly 4 times what is seen in the PostgreSQL ONLY instance. Obviously how well the Platform manages the business logic and mashup loading will depend on a lot of factors. In this test scenario, the OPS was increased by increasing the mashup refresh rate on the InfluxDB instances (which could handle over double the operations). Likewise, the number of Stream writes to the PostgreSQL database could be double what it was when PostgreSQL was the only database. Therefore, configuring InfluxDB for the data ingestion and leaving Postgres for the rest of the application certainly makes the load much easier on the Platform, and the same would be true even in a much more complex scenario.   Memory and CPU The important thing here is to keep the memory use low enough that any spikes in usage won’t cause a server malfunction. CPU Usage should stay at or below around 75%, and Memory should never exceed around 80% of the total allocated to the server. The sizing guides can help determine what this allocation of memory needs to be.   Of note in these charts is the slight, upward slope of the CPU usage in the darker chart, indicating the start of a catastrophic failure, and the difference in the total memory needed for the ThingWorx Platform and Postgres servers when Influx is used or not. As is apparent, the servers use much less memory when the database load is split up intelligently across multiple servers.   FIGURE 4 – THE THINGWORX CPU IS ABOUT THE SAME HERE AS IN THE INFLUXDB CONFIGURATION BELOW BUT LOOK AT HOW MUCH MORE MEMORY BOTH THE PLATFORM AND THE POSTGRES DATABASE NEED ALLOCATED TO THEM IN THIS CONFIGURATION (64 GB A PIECE). ALSO NOTE THE JUMP IN CPU AND MEMORY USAGE AFTER 12:30P. THIS IS REFERENCED IN THE PREVIOUS SECTION, AND THE SLOPE UPWARD OF THE USAGE AFTER THAT POINT INDICATES THE START OF A CATASROPHIC FAILURE. THE TEST ENDS TOO SOON TO SEE ANY SYMPTOMS OF FAILURE, BUT IT IS A SURE THING AFTER THE INCREASE IN LOAD AROUND 12:30P. FIGURE 5 – INFLUX NEEDS AN EXTRA SERVER, BUT THE SIZE OF THE INFLUX AND POSTGRES SERVERS TOGETHER IS LESS THAN HALF THE SIZE AS THAT REQUIRED FOR THE SINGLE POSTGRES DATABASE IN THE POSTGRES ONLY CONFIGURATION (8 GB). THINGWORX IS SMALLER TOO (32 GB).     Hardware Specifications and Performance These are the exact specifications for each simulated instance, broken down by size and whether InfluxDB is configured or not. Note that some of the hardware specifications may be more than is necessary real-world use case depending. As stated previously, this document is not a sizing guide (use the official ThingWorx Sizing Guide). Note that the maximum number of WPS and OPS are shown here. The maximums are higher in the InfluxDB scenarios, meaning that even with smaller-sized servers, the InfluxDB configurations can handle much greater loads.   Summary In conclusion, if InfluxDB may at some point be needed in the lifecycle of an application, because the expected number of things or the number of properties on each thing is large enough that it will max the limitations of the Platform otherwise, then InfluxDB should be used from the very start. There are benefits to using InfluxDB for data ingestion at every size, from performance to reliability, and of course the obviously improved scalability as well.   Reworking the application for use with InfluxDB later on can be costly and cause delays. This is why the benefits and costs associated with an InfluxDB-centric hardware configuration should be considered from the start. More servers are required for InfluxDB, but each of these servers can be sized smaller (depending on the use case), and all of this will affect the overall cost of hosting the ThingWorx application. The benefits of InfluxDB are especially pronounced when used in conjunction with clusters, which will be demonstrate fully in the 9.0 Sizing Guide (soon to be released). If InfluxDB is used to interface with the clusters, then there are even more resources to spare for user requests.   It is considered ThingWorx best practice for high ingestion customers to make use of InfluxDB in applications of any size. Note, though, that this will mean the number of Value Streams per Influx Database will need to be limited to single digits. We hope this helps, and from everyone here at the EDC, happy developing!
View full tip
Developing Great IoT Solutions Brought to you once again by your EDC team, find attached here a brand-new, comprehensive overview of ThingWorx best practices! This guide was crafted by combining all available feedback, from support cases to PTC Community threads, and tapping all internal resources. Let this guide serve to bridge the knowledge gaps ThingWorx developers most commonly see.    The Developing Great IoT Solutions (DGIS) Guide is a great way to inform both business and technically minded folks about the capabilities of the ThingWorx Platform. Learn how to design good solutions from a high-level, an overview designed specifically with the business audience in mind. Or, learn how to implement good IoT designs through a series of technical examples. Start from very little knowledge of the Platform and end up understanding data structures and aggregation, how to use the collection widget, and how to build a fully functional rules engine for sending and acknowledging alerts in ThingWorx.   For the more advanced among us, check out the Appendix. Find here a handy list of do's and don'ts surrounding ThingWorx best practice in development, with links to KCS, Help Center, and Community content.   Reinforce your understanding of the capabilities of the ThingWorx Platform with this guide, today!   A big thanks to all who were involved on this project! Happy developing!
View full tip
Announcing: ThingWorx Solution Central 3.1.0 and its New API Written by: Tori Firewind of the IoT EDC   Solution Central 3.1.0 ThingWorx Solution Central (SC) is the solution management tool for ThingWorx and Digital Performance Management (DPM), the latest version of which (DPM 1.1) can now be deployed directly from the PTC Solutions menu of SC. Streamlining packaging strategies and ensuring efficient solution deployment can now be done for all kinds of ThingWorx solutions, even those with heavy customization. The new API allows for updated building blocks from within the PTC Solutions menu to be easily discovered and deployed right alongside custom solutions. Even the most advanced developers can now house their deployment management process within SC.   As discussed in a previous article, Solution Central forms a necessary part of a mature DevOps pipeline, usually as a set of services within Foundation which finalize and publish the solution to the Solution Central servers. The recommendation to utilize Solution Central from within Foundation remains a best practice for ThingWorx DevOps because the vast majority of solutions benefit from using the ThingWorx APIs, which scan and check for dependencies and proper XML formatting on each included entity.   Packaging and publishing the solution from within ThingWorx is the easiest and most straightforward way recommended by PTC, but it is now possible to publish to Solution Central using a standard API for those who need to publish from Jenkins or other build jobs. If there are legacy extensions, 3rd party tool dependencies, or other customizations within the ThingWorx application, then it may be beneficial to use this new API instead.   The new API also allows for editable extensions and entities within a published solution, though PTC still recommends avoiding this as a general practice. It is usually better for purposes of maintainability and ease of upgrades to just publish the solution again (with an incremented) version each time any changes are made.   How to Use the API Within Solution Central, a new menu option has been added to review the API, with information about the different types of requests and their parameters and responses. To access this within the Solution Central UI, open the help menu and navigate to “Public APIs” (see the image on the right). To see a sample response, select a request type and scroll down to the “Responses” section (shown below). Examples of error responses are also provided, and it’s important to ensure that whatever makes these requests can properly report or log any errors for troubleshooting and maintenance of the DevOps process.   The general steps for making use of this API are as follows: Create the solution resource with a POST Add some files to that solution with PUTs First, create the solution archive with the right Solution Identifiers; this should contain at least one project XML and all of the entities belonging to that ThingWorx project Next, compute MD5 using a tool like DigestUtils on the contents of the archive; this checksum is required for Solution Central Compute the SHA hash on the archive and save it; this will need to be provided along with the archive in the PUT requests Compute MD5 on the hash file also Finally, make the two PUT requests, one for the archive and one for its hash; for example cURL requests, see the Help Center Publish the solution using a PUT So, with a little more work it is now possible to make use of SC in a more custom DevOps process. It is now possible to build JSON or XML solutions using development tools outside of ThingWorx and still publish these customizations to Solution Central. The process of DevOps Management just became more versatile, and with the ease of deployment of DPM and other PTC building blocks as well, ThingWorx is now more accessible and easy to use than ever before.
View full tip
Persistent vs. Logged Properties By Mike Jasperson, VP of IoT EDC   Executive Summary ThingWorx provides several different “aspects” (or storage options) for how property values are saved.  These options each have different implications for performance and scalability.  Understanding those implications is important for designing a scalable IOT solution.   Persistent Properties are best used for non-telemetry data which will change infrequently (for example only a few times in a day) and where historical values are not required.  When overused, Persistent properties can put significant pressure on the database layer of your ThingWorx implementation, leading to poor performance of your IOT application.  As the number of Things in your IOT application scales up, the quantity or frequency of persistent properties per Thing needs to be carefully considered.   Logged Properties are best used for telemetry data where historical values need to be retained, but also for any other value that is expected to change frequently.  Logged properties can create some additional requirements: a process for handling null/default values after restarts, more disk space, and a data retention policy. There are benefits as well, though, like more flexibility and scalability for the ingestion of larger volumes of data.   Persistent + Logged Properties perform database operations of both aspects.  Combined use should be very limited – only properties that update infrequently (a few times a day), and that must be in-memory in the event of a ThingWorx restart.   In-Memory Only Properties are neither persistent nor logged – they are not stored to the database.  These properties can greatly improve scale for values that need to be available for the application to drive UIs or compute other derived values that will be stored.  However, high-frequency updates of in-memory properties can create scale challenges in HA (high availability) ThingWorx configurations where memory state needs to be constantly shared between multiple ThingWorx nodes.     Find a complete summary as well as example cases in the document attached.
View full tip
Is your team operating an effective DevOps pipeline? DevOps is an important part of a mature, enterprise ready application, but the process isn’t simple.   This expert session will focus on how a full DevOps pipeline looks like and how PTC can help to build a seamless pipeline. Join us for our upcoming Expert Session to learn how to create a Docker image, integrate Azure with Docker and Git, and set up a seamless DevOps pipeline.   When? Thursday, September 30th 2021 | 11 AM EST Host: Tori FIrewind, Senior Engineer in PTC IOT Enterprise Deployment Center Registration link: https://www.ptc.com/en/resources/iiot/webcast/devops-pipeline-thingworx 
View full tip
The New and Improved DGIS Guide to ThingWorx Development Written by Victoria Firewind of the IoT EDC   The classic Developing Great IoT Solutions guide has been reskinned and revamped for newer versions of ThingWorx! The same information on how to build a quality IoT application is now available for versions of ThingWorx 9.1+, and now, a complete sample application is included to demonstrate these ideas.    Find within the attached archive a PDF with high-level overview information on development and application design geared towards managers and business users, so that everyone can understand the necessary requirements, common terms, and key tips on how to ensure an application is scalable and maintainable right from the very start. Reduce your chances of running into issues between PoC and Go Live by reviewing this information today!   Also find within this PDF a series of tutorials which teach not just how to use the ThingWorx software, but which also educate on how to make good application design choices. A basic rules engine for sending real-time notifications is included here, as well as a complete demo application which illustrates each concept in a real-world use case. This Coffee Machine Demo App relies upon the tutorial entities, which can also now be imported directly using the other XML files provided here. This ensures that anyone can review these concepts, regardless of how much time one can commit or how much knowledge one already has on the subject.   This is a complex guide, and any issues, questions, or bugs found within can be reported right here on this thread. Happy developing from the IoT EDC!
View full tip
ThingWorx 8.5 Sizing Guide Sizing is a very important part of the application design process, answering such questions as: how much hardware is required? What specifications does this hardware need to handle the expected load? And therefore, what is the overall cost of setting up and maintaining the ThingWorx environment? Properly sizing the environment before development begins ensures that there are no unexpected costs or limitations to application functionality later on down the road.   "Hardware sizing is driven by many factors - some more easily calculated than others", as stated in the new ThingWorx 8.5 Sizing Guide. "Measures like data streaming frequency (the data ingestion component) and HTTP request volume (the data visualization component) are easily calculated... However, sizing considerations for the data processing component of the application can depend largely on business use cases and application design." Enterprise-Ready applications have the capacity to handle all aspects of an IoT application, from data ingestion and processing to data visualization, as detailed in the friendly infographic above, which many will recognize from LiveWorx. Inside the ThingWorx 8.5 Sizing Guide,  there are formulas designed to help size the more analytical aspects of the application, as well as descriptions of other factors and how they (conceptually) play a role in sizing.    There are also two application design examples which step the reader through the calculations, the comparisons, and the selections of hardware for each use case. New in this version, these examples have been simulated in the real world to prove that the theory behind these calculations is sound, and to demonstrate the full process of designing, sizing, and testing an application.  One of the examples (shown here) sizes a Connected Product Solution, something which has many, many remote things in the field, each writing to the Platform at a slower rate, for consumption by a large number of general users, who don't access the same mashups many times nor refresh their view very often. The second example is much more complex, modeling an industrial use-case, where there are many different kinds of users each accessing the mashups many times, fewer things, and more variations in the types of properties each thing possesses. These examples are designed to help anyone with any use case step through the sizing of their application properly.   Please check out the new ThingWorx 8.5 Sizing Guide, especially because each version of ThingWorx is different and must be sized accordingly. Comments and questions about the guide are very welcome right here on this thread!
View full tip
  The IoT Enterprise Deployment Center’s goal is to create and share knowledge around the best practices for architecting, designing, and deploying successful, enterprise-scale Thingworx IoT Solutions.    To accomplish this goal, the EDC team takes a “real world” approach, using simulated IoT assets and users to benchmark the capabilities of different Thingworx deployment configurations. First, each implementation is pushed to its limit in an effort to establish real-world baselines, metrics which can be used to help customers determine which architecture choices will work for their custom needs. Then, each implementation is pushed beyond its limits, providing useful insight into where and why things fail, and illuminating potential implementation changes which could push the boundaries further.   Through the simulations testing to come, the EDC will be publishing the resulting benchmarks for all to see! These benchmarks will include details on implementation goals and performance metrics for different stages of deployment. Additionally, best-practice articles which illustrate how to deploy the different architectural components (those referenced within the benchmarks) will also be posted, highlighting the optimal approach to integrating everything into the Thingworx platform.   Stay tuned to see more about just how versatile the ThingWorx Platform can be! We look forward to discussing these findings as they are published right here on the PTC Community. 
View full tip
Distributed Timer and Scheduler Execution in a ThingWorx High Availability (HA) Cluster Written by Desheng Xu and edited by Mike Jasperson    Overview Starting with the 9.0 release, ThingWorx supports an “active-active” high availability (or HA) configuration, with multiple nodes providing redundancy in the event of hardware failures as well as horizontal scalability for workloads that can be distributed across the cluster.   In this architecture, one of the ThingWorx nodes is elected as the “singleton” (or lead) node of the cluster.  This node is responsible for managing the execution of all events triggered by timers or schedulers – they are not distributed across the cluster.   This design has proved challenging for some implementations as it presents a potential for a ThingWorx application to generate imbalanced workload if complex timers and schedulers are needed.   However, your ThingWorx applications can overcome this limitation, and still use timers and schedulers to trigger workloads that will distribute across the cluster.  This article will demonstrate both how to reproduce this imbalanced workload scenario, and the approach you can take to overcome it.   Demonstration Setup   For purposes of this demonstration, a two-node ThingWorx cluster was used, similar to the deployment diagram below:   Demonstrating Event Workload on the Singleton Node   Imagine this simple scenario: You have a list of vendors, and you need to process some logic for one of them at random every few seconds.   First, we will create a timer in ThingWorx to trigger an event – in this example, every 5 seconds.     Next, we will create a helper utility that has a task that will randomly select one of the vendors and process some logic for it – in this case, we will simply log the selected vendor in the ThingWorx ScriptLog.     Finally, we will subscribe to the timer event, and call the helper utility:     Now with that code in place, let's check where these services are being executed in the ScriptLog.     Look at the PlatformID column in the log… notice that that the Timer and the helper utility are always running on the same node – in this case Platform2, which is the current singleton node in the cluster.   As the complexity of your helper utility increases, you can imagine how workload will become unbalanced, with the singleton node handling the bulk of this timer-driven workload in addition to the other workloads being spread across the cluster.   This workload can be distributed across multiple cluster nodes, but a little more effort is needed to make it happen.   Timers that Distribute Tasks Across Multiple ThingWorx HA Cluster Nodes   This time let’s update our subscription code – using the PostJSON service from the ContentLoader entity to send the service requests to the cluster entry point instead of running them locally.       const headers = { "Content-Type": "application/json", "Accept": "application/json", "appKey": "INSERT-YOUR-APPKEY-HERE" }; const url = "https://testcluster.edc.ptc.io/Thingworx/Things/DistributeTaskDemo_HelperThing/services/TimerBackend_Service"; let result = Resources["ContentLoaderFunctions"].PostJSON({ proxyScheme: undefined /* STRING */, headers: headers /* JSON */, ignoreSSLErrors: undefined /* BOOLEAN */, useNTLM: undefined /* BOOLEAN */, workstation: undefined /* STRING */, useProxy: undefined /* BOOLEAN */, withCookies: undefined /* BOOLEAN */, proxyHost: undefined /* STRING */, url: url /* STRING */, content: {} /* JSON */, timeout: undefined /* NUMBER */, proxyPort: undefined /* INTEGER */, password: undefined /* STRING */, domain: undefined /* STRING */, username: undefined /* STRING */ });   Note that the URL used in this example - https://testcluster.edc.ptc.io/Thingworx - is the entry point of the ThingWorx cluster.  Replace this value to match with your cluster’s entry point if you want to duplicate this in your own cluster.   Now, let's check the result again.   Notice that the helper utility TimerBackend_Service is now running on both cluster nodes, Platform1 and Platform2.   Is this Magic?  No!  What is Happening Here?   The timer or scheduler itself is still being executed on the singleton node, but now instead of the triggering the helper utility locally, the PostJSON service call from the subscription is being routed back to the cluster entry point – the load balancer.  As a result, the request is routed (usually round-robin) to any available cluster nodes that are behind the load balancer and reporting as healthy.   Usually, the load balancer will be configured to have a cookie-based affinity - the load balancer will route the request to the node that has the same cookie value as the request.  Since this PostJSON service call is a RESTful call, any cookie value associated with the response will not be attached to the next request.  As a result, the cookie-based affinity will not impact the round-robin routing in this case.   Considerations to Use this Approach   Authentication: As illustrated in the demo, make sure to use an Application Key with an appropriate user assigned in the header. You could alternatively use username/password or a token to authenticate the request, but this could be less ideal from a security perspective.   App Deployment: The hostname in the URL must match the hostname of the cluster entry point.  As the URL of your implementation is now part of your code, if deploy this code from one ThingWorx instance to another, you would need to modify the hostname/port/protocol in the URL.   Consider creating a variable in the helper utility which holds the hostname/port/protocol value, making it easier to modify during deployment.   Firewall Rules: If your load balancer has firewall rules which limit the traffic to specific known IP addresses, you will need to determine which IP addresses will be used when a service is invoked from each of the ThingWorx cluster nodes, and then configure the load balancer to allow the traffic from each of these public IP address.   Alternatively, you could configure an internal IP address endpoint for the load balancer and use the local /etc/hosts name resolution of each ThingWorx node to point to the internal load balancer IP, or register this internal IP in an internal DNS as the cluster entry point.
View full tip
ThingWorx DevOps with Azure: The Comprehensive DevOps Guide Written by Tori Firewind, IoT EDC   As promised in a previous post,  attached here is a comprehensive guide to DevOps in ThingWorx, including tutorials and instructions for creating a continuous integration, continuous deployment (CI/CD) process for application development.  There are also updated scripts and entities attached, including an entire sample application for importing, exporting, and testing an application in ThingWorx. From Docker and Github to Azure DevOps and Solution Central, this guide has it all. Learn how to perform your role in the DevOps process whether an administrator or a developer, automate your deployments and testing, and create a more efficient process  for publication changes to production.  A complete DevOps process like this really does facilitate faster and easier updates with fewer risks, fewer delays, and a better pathway to success.  
View full tip
Thundering Herd Scenarios in ThingWorx Written by Jim Klink, Edited by Tori Firewind   Introduction The thundering herd topic is quite vast, but it can be broken down into two main categories: the “data flood” and the “reconnect storm”. One category involves what happens to the business login (the “data flood” scenario) and affects both Factory and Connected Products use cases. The other category involves bringing many, many devices back online in a short time (the “reconnect storm” scenario), which largely influences Connected Products scenarios.   Citation: https://gfp.sd.gov/buffalo-roundup/ Think of Connected Products as a thundering stampede of many small buffalo, which then makes a Factory thundering herd scenario a stampede of a couple massive brontosaurus, much fewer in number, but still with lots of persisted data to send back in. This article focuses in on how to manage the “reconnect storm” scenario, by delaying the return of individual buffalo to reduce the intensity of the stampede. Find here the necessary insights on how to configure your ThingWorx edge applications to minimize the effect of a server down scenario.    The C-SDK will be used for examples, but the general principles will apply to any of the ThingWorx edge options (EMS, .Net SDK, Java SDK).  This article also references the ExampleAgent application which is built using the C-SDK. The ExampleAgent is available for download as an attachment to this post.  It offers an easily configurable edge solution for Windows and Linux that can be used for the following purposes: Foundation for rapid development of a robust custom edge application based on the ThingWorx C-SDK for use by customers and partners. Full featured, well documented, ‘C’ source code example of developing an application using the ThingWorx C-SDK. A “local” issue is one which affects a single agent, a loss of connectivity due to hardware malfunction or local network issues. Local issues are quite common in the IoT world, and recovery usually isn’t too much of a challenge. A “global” issue occurs when many agents disconnect simultaneously, usually because there is an issue with the ThingWorx server itself (though the Load Balancer, Connection Server, or web hosting software could also be the source). Perhaps it is a scheduled software update, perhaps it is unexpected downtime due to issues, but either way, it’s important to consider how the fleet of agents will respond if ThingWorx suddenly becomes unavailable.   There are two broad issues to consider in a situation like this. One is maintaining the agent’s data so that it can be sent when the connection becomes available again. This can be done in the C-SDK using an offline file storage system, which includes properties, events, and services. Offline storage is configured in the twConfig.h file in the C SDK.  The second issue the number of Agents seeking to reconnect to the server in a short period of time when the server is available.    Of course, if revenue is based on uptime, perhaps persisting data is less critical and can be lost, making things simple. However, in most cases, this data will need to be stored on the edge device until reconnect. Then, once the server comes back up, suddenly all of this data comes streaming in from all of the many edge devices simultaneously.   This flood of both data and reconnection of a multitude of agents can create what is called a “thundering herd” scenario, in which ThingWorx can become backlogged with data processing requests, data can be lost if the queues are overwhelmed, or worst-case, the Foundation server can become unresponsive once again. This is when outages become costly and drag on longer than necessary. Several factors can lead to a thundering herd scenario, including the number of agents in the fleet, the amount of stored data per agent, the amount of data ordinarily sent by these devices, which is sent side-by-side with the stored data upon reconnection, and how much processing occurs once all of this data is received on the Foundation server.   The easiest way to mitigate a potential thundering herd scenario, and this is considered a ThingWorx best practice as well, is to randomize the reconnection of devices. Each agent can be configured to delay itself by a random amount of time before attempting to reconnect after a loss of connectivity. This random delay then distributes the number of assets connecting at a time over a longer period, thus minimizing the impact of the reconnections on ThingWorx. There are several configuration settings that help in this regard.   Configuring the Herd (C-SDK) The C-SDK is great at managing agent connectivity, having a lot of options for fine-tuning the connections. The web-socket connection is managed by the SDK layer of the edge device (which also manages the retry process). To review the source code for how connections are made, see the C-SDK file found here: src\api\twApi.c, specifically the function called twApi_Connect().   The ExampleAgent uses custom configuration files to manage this process from the application layer, a more robust and complete solution. Detailed here are the configuration options in the ExampleAgent attached to this post, most of which can be found in its ws_connection.json configuration file: connect_timeout is used throughout the C-SDK as the time to wait for a web-socket connection to be established (i.e. the ‘timeout’ value). This is the maximum delay for the socket to be established or to send and receive data. If it is established sooner, then a success code is returned. If a connection is not established in the configured timeout period, then an error is returned. Setting this value to 10 seconds is reasonable, for reference. connect_retries is the number of times the SDK will attempt to establish a connection before the twApi_Connect() function returns an error. Setting this to -1 will trigger the SDK to stay in the loop infinitely until a connection is established. connect_retry_interval is the delay between connection retries. max_connect_delay is used as a delay before even entering the loop, that which uses the connect_retries and connect_retry_interval parameters to establish the connection. The SDK function twAddConnectionDelay() is called, which delays by a random amount of time between 0 seconds and the value given by this parameter. This random delay is only used once per call to twApi_Connect().  This is therefore the parameter most critical to preventing thundering herd scenarios (as discussed above). Configuring the SDK agents to reconnect in this way is critical, but there are also some drawbacks, namely that while the twApi_Connect() function is running, there is no clean way of shutting the agent down. Likewise, the agent only does ONE randomized delay per call of the twApi_Connect() function, meaning that if reconnection cannot occur immediately, it’s still possible for many agents to try to reconnect at once. Consider this when determining what values to assign to these parameters.   ExampleAgent Design The ExampleAgent provided here is a fully implemented, configurable application, like the EMS in terms of functionality, but containing only simulated data. The data capture component is missing here and has to be custom developed. Attached alongside this source code is extensive documentation that explains how to get the application set up and configured. This isn’t meant to be used directly in a production environment.   Please note that the ExampleAgent is provided as-is; it is not an officially released product by PTC.   This disclaimer includes the ExampleAgent source code, build process, documentation, deliverables as well as any ExampleAgent modifications to the official releases of the C-SDK or the SCM extension product. Full and sole responsibility for the use, deployment, reliability, and accuracy of any ExampleAgent related code, documentation, etc. falls to the user, and any use of the ExampleAgent is an implicit agreement with this disclaimer.   The ExampleAgent was developed by PTC sales and services to help in the Edge application development process.  For assistance, support, or additional development, an authorized statement of work is needed.  Please Note:  PTC support is not aware of the existence of the ExampleAgent and cannot provide assistance.    Because of the small downside to configuring the twApi_Connect() function directly as discussed above, there is alternative approach given here as well. The ExampleAgent module ConnectionMgr.c controls the calling of the twApi_Connect() on a dedicated connection thread. The ConnectionMgrThreadFunction() contains the source code necessary to understanding this process.   The ConnectionMgr.c workflow and source code visualization via Microsoft Visual Studio are in the diagrams below. The ExampleAgent defines its own randomized delay to mitigate the thundering herd scenario while still deploying an edge system that responds to shut down requests cleanly. In this case, the randomized delay is configured by the parameter reconnect_random_delay_seconds in the agent_config.json file. Since the ConnectionMgrThreadFunction() controls the calling of twApi_Connect(), the ConnectionMgrThreadFunction() will delay the randomized value EVERY time before calling this reconnect function. A separate thread is created to call the reconnect function so that there are still resources available for data processing and to check for shutdown signals and other conditions.   Recommended Values These recommendations are based around managing the reconnection process from the application layer. These may be different if the C-SDK is configured directly, but creating application layer management is recommended and provided in the ExampleAgent attached. The ExampleAgent is configured by default to simplify the SDK layer’s involvement.   These configuration options tell the SDK layer to try to connect just once, after just 1 second: There is no official recommendation for the above values due to the fact that every use case is different and will require different fine tuning to work well.   Then this setting here handles the retry process from within the application layer of the ExampleAgent: Conclusion To reduce the chances of a thundering herd scenario, configure the fleet to reconnect after differing random delays. The larger the random delay times, the longer it takes for the fleet to come back online and fleet data to be received. While more complex ThingWorx deployment architectures (such as container-based deployments like Kubernetes or Thingworx High Availability (HA) clusters) can also help to address the increased peak load during a thundering herd event, randomized reconnect delays can still be an effective tool.        
View full tip