cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
Showing results for 
Search instead for 
Did you mean: 
Sort by:
The IoT Building Block Design Framework By Victoria Firewind and Ward Bowman, Sr. Director of the IoT EDC   Building Block Overview As detailed quite extensively on its own designated Help Center page, building blocks are the future of scalable and maintainable IoT architecture. They are a way to organize development and customization of ThingWorx solutions into modular, well-defined components or packages. Each building block serves a specific purpose and exists as independently as possible from other modules. Some blocks facilitate external data integration, some user interface features, and others the manipulation or management of different kinds of equipment. There are really no limits to how custom a ThingWorx solution can be, and customizations are often a major hurdle to a well-oiled dev ops pipeline. It’s therefore crucial for us all to use a standard framework, to ensure that each piece of customization is insular, easy-to-replace, and much more maintainable. This is the foundation of good IoT application design.   PTC’s Building Block Framework At PTC, building blocks are broken down in a couple different ways: categories and types. The category of a building block is primarily in reference to its visibility and availability for use by the greater ThingWorx community. We use our own framework here at PTC, so our solution offerings are based around solution-specific building blocks, things which we provide as complete, SaaS solutions. These solution-specific building blocks combine up into single solutions like DPM, offerings which require a license, but can then easily be deployed to a number of systems. PTC solutions provide many, complex, OOTB features, like the Production Dashboard of DPM, and the OperationKPI building block.   Anyone doing any sort of development with ThingWorx, however, does still have access to many other building blocks, included with the domain specific building block category. These are the pieces used to build the solution building blocks like DPM, and they can be used to build other more custom solutions as well. Take the OperationKPI block which can vary greatly from one customer to the next in how it is calculated or analyzed. The pieces used to build the version that ships with DPM are right there, for instance, Shift and ReasonCode. They are designed to have minimal dependencies themselves, meant to be used as dependencies by custom blocks which do custom versions of the module logic found within the PTC solution-specific blocks. Then there are the common blocks as well, and these are used even more widely for things like user management and database connectivity.   The type of the building block refers really to where that building block falls in the greater design. A UI building block consumes data and displays it, so that is the View of a classic MVC design pattern. However, sometimes user input is needed, so perhaps that UI building block will depend on another block. This other block could have a utility entity that fuels UI logic, and the benefit to having that be a separate block is then the ease with which it can be subbed out from one ThingWorx instance to the next.   Let’s say there are regional differences in how people make use of technologies. If the differences are largely driven by what data is available from physical devices at a particular site, then perhaps these differences require different services to process user input or queries on the same UI mashups used across every site. Well in this case, having the UI block stand alone is smart, because then the Model and Controller blocks can be abstracted out and instantiated differently at different sites.   The two types that largely define the Model and Controller of the classic MVC are called Abstract and Implementation building blocks, and the purposes are intertwined, but distinct: abstract building blocks expose the common API endpoints which allow for the implementation blocks to vary so readily. The implementation blocks are then those which actually alter the data model, which is what happens when the InitializeSolution method is called. In that service, they are given everything they need to generate their data tables and data constraints, so that they are ready to be used once the devices are connected.   Occasionally, UI differences will also be necessary when factories or regions have different ways of doing things. When this happens, even mashups can be abstracted using abstract building blocks. Modular mashups which can be combined into larger ones can be provided in an abstract building block, and these can be used to form custom mashups from common parts across many sites in different implementation blocks all based on the same abstract one.   The last type of building block is the standard building block, which is the most generic. This one is not intended to be overridden, often serving as the combination of other building blocks into final solutions. It is also the most basic combination of components necessary to adhere to the building block design framework and interface with the shared deployment infrastructure. The necessary components include a project entity, which contains all of the entities for the building block, an entry point, which contains the metadata (name, description, version, list of dependencies, etc.) and overrides the service for automatic model generation (DeployComponent), and the building block manager. The dashed arrows indicate an entity implements another, and the solid arrows, extension. The manager is the primary service layer for the building block, and it makes most of the implementation decisions. It also has all of the information required to configure menus and select which contained mashups to use when combining modular mashups into larger views. The manager really consists of 3 entities: a thing template with the properties and configuration tables, a thing which makes use of these, and a thing shape which defines all of the services (which can often be overridden) which the manager thing may make use of.   Most building blocks will also contain security entities which handle user permissions, like groups which can be updated with users. Anyone requiring access to the contents of a particular building block then simply needs to be added to the right groups later on. This as well as model logic entities like thing shapes can be used concurrently to use organizational security for visibility controls on individual equipment, allowing some users to see some machines and not others, and so on.   All of the Managers must be registered in the DefaultGlobalManagerConfiguration table on the PTC.BaseManager thing, or in the ManagerConfiguration table of any entity that implements the PTC.Base.ConfigManagement_TS thing shape. Naming conventions should also be kept standard across blocks, and details on those best practices can be found in the Build Block Help Center.   Extending the Data Model Using Building Blocks Customizing the data model using building blocks is pretty straight-forward, but there are some design considerations to be made. One way to customize the data model is to add custom properties to existing data model entities. For instance, let’s say you need an additional field to keep track of the location of a job order, and so you could add a City field to PTC.JobOrder.JobOrder_AP data shape. However, doing this also requires substantial modification to the PTC.JobOrder.Manager thing, if the new data shape field is meant to interface with the database.   So this method is not as straight-forward as it may seem, but it is the easiest solution in an “object-oriented” design pattern, one where the data varies very little from site to site, but the logic that handles that data does vary. In this design pattern, there will be a thing shape for each implementing building block that handles the same data shape the abstract building block uses, just in different ways. A common use case for why this may happen is if the UI components vary from site to site or region to region, and the logic that powers them must also vary, but the data source is relatively consistent.   Another way to extend the data model is to add custom data shapes and custom managers to go with them. This requires you to create a new custom building block, one which extends the base entry point as needed for the type of block. This method may seem more complicated than just extending a data shape, but it is also easier to do programmatically: all you have to do is create a bunch of entities (thing templates, thing shapes, etc.) which implement base entities. A fresh data shape can be created with complete deliberation for the use case, and then services which already exist can be overridden to handle the new data shape instead. It is a cleaner, more automation friendly approach.   However, the database will still need to be updated, and this time CRUD services are necessary as well, those for creating and managing instances of the data shape. So, this option is not really less effort in the short term. In the long-term, though, scripts that automate much of the process for building block generation can be used to quickly and easily allow for development of new modules in a more complex ThingWorx solution. The ideal is to use abstract blocks defined by the nature of the data shape which each of their implementing blocks will use to hook the view into the data model.   This “data-driven” design pattern is one which involves creating a brand-new block for each customization to the data model. The functionality of each of these blocks centers around what the data table and constraints must look like for that data store, and the logic to handle the different data types should vary within each implementation block, but override the common abstract block interface so that any data source can be plugged into the thing model, and ThingWorx will know what to do with it.   The requests can then contain whatever information they contain (like MQTT), and which logic is chosen to process that data will be selected based on what information is received by the ThingWorx solution for each message. Data shapes can be abstracted in this way, so that a single subscription need exist to the data shape used in the abstract building block, and its logic knows how to call whatever functions are necessary based on whatever data it receives. This allows for very maintainable API creation for both data ingestion and event processing.
View full tip
Using the Solution Central API Pitfalls to Avoid by Victoria Firewind, IoT EDC   Introduction The Solution Central API provides a new process for publishing ThingWorx solutions that are developed or modified outside of the ThingWorx Platform. For those building extensions, using third party libraries, or who just are more comfortable developing in an IDE external to ThingWorx, the SC API makes it simple to still utilize Solution Central for all solution management and deployment needs, according to ThingWorx dev ops best practices. This article hones in one some pitfalls that may arise while setting up the infrastructure to use the SC API and assumes that there is already AD integration and an oauth token fetcher application configured for these requests.   CURL One of the easiest ways to interface with the SC API is via cURL. In this way, publishing solutions to Solution Central really involves a series of cURL requests which can be scripted and automated as part of a mature dev ops process. In previous posts, the process of acquiring an oauth token is demonstrated. This oauth token is good for a few moments, for any number of requests, so the easiest thing to do is to request a token once before each step of the process.   1. GET info about a solution (shown) or all solutions (by leaving off everything after "solutions" in the URL)     $RESULT=$(curl -s -o test.zip --location --request GET "https://<your_sc_url>/sc/api/solutions/org.ptc:somethingoriginal12345:1.0.0/files/SampleTwxExtension.zip" ` --header "Authorization: Bearer $ACCESS_TOKEN" ` --header 'Content-Type: application/json' ` )     Shown here in the URL, is the GAV ID (Group:Artifact_ID:Version). This is shown throughout the Swagger UI (found under Help within your Solution Central portal) as {ID}, and it includes the colons. To query for solutions, see the different parameter options available in the Swagger UI found under Help in the SC Portal (cURL syntax for providing such parameters is shown in the next example).   Potential Pitfall: if your solution is not published yet, then you can get the information about it, where it exists in the SC repo, and what files it contains, but none of the files will be downloadable until it is published. Any attempt to retrieve unpublished files will result in a 404.   2. Create a new solution using POST     $RESULT=$(curl -s --location --request POST "https://<your_sc_url>/sc/api/solutions" ` --header "Authorization: Bearer $ACCESS_TOKEN" ` --header 'Content-Type: application/json' ` -d '"{\"groupId\": \"org.ptc\", \"artifactId\": \"somethingelseoriginal12345\", \"version\": \"1.0.0\", \"displayName\": \"SampleExtProject\", \"packageType\": \"thingworx-extension\", \"packageMetadata\": {}, \"targetPlatform\": \"ThingWorx\", \"targetPlatformMinVersion\": \"9.3.1\", \"description\": \"\", \"createdBy\": \"vfirewind\"}"' )     It will depend on your Powershell or Bash settings whether or not the escape characters are needed for the double quotes, and exact syntax may vary. If you get a 201 response, this was successful.   Potential Pitfalls: the group ID and artifact ID syntax are very particular, and despite other sources, the artifact ID often cannot contain capital letters. The artifact ID has to be unique to previously published solutions, unless those solutions are first deleted in the SC portal. The created by field does not need to be a valid ThingWorx username, and most of the parameters given here are required fields.   3.  PUT the files into the project     $RESULT=$(curl -L -v --location --request PUT "<your_sc_url>/sc/api/solutions/org.ptc:somethingelseoriginal12345:1.0.0/files" ` --header "Authorization: Bearer $ACCESS_TOKEN" ` --header 'Accept: application/json' ` --header 'x-sc-primary-file:true' ` --header 'Content-MD5:08a0e49172859144cb61c57f0d844c93' ` --header 'x-sc-filename:SampleTwxExtension.zip' ` -d "@SampleTwxExtension.zip" ) $RESULT=$(curl -L --location --request PUT "https://<your_sc_url>/sc/api/solutions/org.ptc:somethingelseoriginal12345:1.0.0/files" ` --header "Authorization: Bearer $ACCESS_TOKEN" ` --header 'Accept: application/json' ` --header 'Content-MD5:fa1269ea0d8c8723b5734305e48f7d46' ` --header 'x-sc-filename:SampleTwxExtension.sha' ` -d "@SampleTwxExtension.sha" )     This is really TWO requests, because both the archive of source files and its hash have to be sent to Solution Central for verifying authenticity. In addition to the hash file being sent separately, the MD5 checksum on both the source file archive and the hash has to be provided, as shown here with the header parameter "Content-MD5". This will be a unique hex string that represents the contents of the file, and it will be calculated by Azure as well to ensure the file contains what it should.   There are a few ways to calculate the MD5 checksums and the hash: scripts can be created which use built-in Windows tools like certutil to run a few commands and manually save the hash string to a file:      certutil -hashfile SampleTwxExtension.zip MD5 certutil -hashfile SampleTwxExtension.zip SHA256 # By some means, save this SHA value to a file named SampleTwxExtension.sha certutil -hashfile SampleTwxExtension.sha MD5       Another way is to use Java to generate the SHA file and calculate the MD5 values:      public class Main { private static String pathToProject = "C:\\Users\\vfirewind\\eclipse-workspace\\SampleTwxExtension\\build\\distributions"; private static String fileName = "SampleTwxExtension"; public static void main(String[] args) throws NoSuchAlgorithmException, FileNotFoundException { String zip_filename = pathToProject + "\\" + fileName + ".zip"; String sha_filename = pathToProject + "\\" + fileName + ".sha"; File zip_file = new File(zip_filename); FileInputStream zip_is = new FileInputStream(zip_file); try { // Calculate the MD5 of the zip file String md5_zip = DigestUtils.md5Hex(zip_is); System.out.println("------------------------------------"); System.out.println("Zip file MD5: " + md5_zip); System.out.println("------------------------------------"); } catch(IOException e) { System.out.println("[ERROR] Could not calculate MD5 on zip file named: " + zip_filename + "; " + e.getMessage()); e.printStackTrace(); } try { // Calculate the hash of the zip and write it to a file String sha = DigestUtils.sha256Hex(zip_is); File sha_output = new File(sha_filename); FileWriter fout = new FileWriter(sha_output); fout.write(sha); fout.close(); System.out.println("[INFO] SHA: " + sha + "; written to file: " + fileName + ".sha"); // Now calculate MD5 on the hash file FileInputStream sha_is = new FileInputStream(sha_output); String md5_sha = DigestUtils.md5Hex(sha_is); System.out.println("------------------------------------"); System.out.println("Zip file MD5: " + md5_sha); System.out.println("------------------------------------"); } catch (IOException e) { System.out.println("[ERROR] Could not calculate MD5 on file name: " + sha_filename + "; " + e.getMessage()); e.printStackTrace(); } }     This method requires the use of a third party library called the commons codec. Be sure to add this not just to the class path for the Java project, but if building as a part of a ThingWorx extension, then to the build.gradle file as well:     repositories { mavenCentral() } dependencies { compile fileTree(dir:'twx-lib', include:'*.jar') compile fileTree(dir:'lib', include:'*.jar') compile 'commons-codec:commons-codec:1.15' }       Potential Pitfalls: Solution Central will only accept MD5 values provided in hex, and not base64. The file paths are not shown here, as the archive file and associated hash file shown here were in the same folder as the cURL scripts. The @ syntax in Powershell is very particular, and refers to reading the contents of the file, in this case, or uploading it to SC (and not just the string value that is the name of the file). Every time the source files are rebuilt, the MD5 and SHA values need to be recalculated, which is why scripting this process is recommended.   4. Do another PUT request to publish the project      $RESULT=$(curl -L --location --request PUT "https://<your_sc_url>/sc/api/solutions/org.ptc:somethingelseoriginal12345:1.0.0/publish" ` --header "Authorization: Bearer $ACCESS_TOKEN" ` --header 'Accept: application/json' ` --header 'Content-Type: application/json' ` -d '"{\"publishedBy\": \"vfirewind\"}"' )     The published by parameter is necessary here, but it does not have to be a valid ThingWorx user for the request to work. If this request is successful, then the solution will show up as published in the SC Portal:    Other Pitfalls Remember that for this process to work, the extensions within the source file archive must contain certain identifiers. The group ID, artifact ID, and version have to be consistent across a couple of files in each extension: the metadata.xml file for the extension and the project.xml file which specifies which projects the extensions belong to within ThingWorx. If any of this information is incorrect, the final PUT to publish the solution will fail.   Example Metadata File:     <?xml version="1.0" encoding="UTF-8"?> <Entities> <ExtensionPackages> <ExtensionPackage artifactId="somethingoriginal12345" dependsOn="" description="" groupId="org.ptc" haCompatible="false" minimumThingWorxVersion="9.3.0" name="SampleTwxExtension" packageVersion="1.0.0" vendor=""> <JarResources> <FileResource description="" file="sampletwxextension.jar" type="JAR"></FileResource> </JarResources> </ExtensionPackage> </ExtensionPackages> <ThingPackages> <ThingPackage className="SampleTT" description="" name="SampleTTPackage"></ThingPackage> </ThingPackages> <ThingTemplates> <ThingTemplate aspect.isEditableExtensionObject="false" description="" name="SampleTT" thingPackage="SampleTTPackage"></ThingTemplate> </ThingTemplates> </Entities>       Example Projects XML File:     <?xml version="1.0" encoding="UTF-8"?> <Entities> <Projects> <Project artifactId="somethingoriginal12345" dependsOn="{&quot;extensions&quot;:&quot;&quot;,&quot;projects&quot;:&quot;&quot;}" description="" documentationContent="" groupId="org.ptc" homeMashup="" minPlatformVersion="" name="SampleExtProject" packageVersion="1.0.0" projectName="SampleExtProject" publishResult="" state="DRAFT" tags=""> </Project> </Projects> </Entities>       Another large issue that may come up is that requests often fail with a 500 error and without any message. There are often more details in the server logs, which can be reviewed internally by PTC if a support case is opened. Common causes of 500 errors include missing parameter values that are required, including invalid characters in the parameter strings, and using an API URL which is not the correct endpoint for the type of request. Another large cause of 500 errors is providing MD5 or hash values that are not valid (a mismatch will show differently).    Another common error is the 400 error, which happens if any of the code that SC uses to parse the request breaks. A 400 error will also occur if the files are not being opened or uploaded correctly due to some issue with the @ syntax (mentioned above).  Another common 400 error is a mismatch between the provided MD5 value for the zip or SHA file, and the one calculated by Azure ("message:  Md5Mismatch"), which can indicate that there has been some corruption in the content of the upload, or simply that the MD5 values aren't being calculated correctly. The files will often say they have 100% uploaded, even if they aren't complete, errors appear in the console, or the size of the file is smaller than it should be if it were a complete upload (an issue with cURL).   Conclusion Debugging with cURL can be a challenge. Note that adding "-v" to a cURL command provides additional information, such as the number of bytes in each request and a reprint of the parameters to ensure they were read correctly.  Even still, it isn't always possible for SC to indicate what the real cause of an issue is. There are many things that can go wrong in this process, but when it goes right, it goes very right. The SC API can be entirely scripted and automated, allowing for seamless inclusion of externally-developed tools into a mature dev ops process.
View full tip
User Load Testing in ThingWorx Java Client Tutorial Written by Tori Firewind, IoT EDC   Introduction As stated in previous posts, user load testing is a critical component of ensuring a ThingWorx solution is Enterprise-ready. Even a sturdy new feature that seems to function well in development can run into issues once larger loads are thrown into the mix. That's why no piece of code should be considered production-ready until it has undergone not just unit and integration testing (detailed in our Comprehensive DevOps Guide), but also load testing that ensures a positive user experience and an adequately sized server to facilitate the user load.    The EDC has spent quite a few posts detailing the process of setting up an accurate, real-world testing suite using JMeter for ThingWorx. In this piece, we detail an alternative approach that makes use of the Java Spring Boot Framework to call rest requests against the ThingWorx server and simulate the user load. This Java Client tutorial produces a very immature user load client, one which would still take a lot of development to function as flexibly as the JMeter tutorial counterpart. For Java developers, however, this is still a very attractive approach; it allows for more custom, robust testing suites that come only as an investment made in a solid testing tool.   For someone experienced in Java, the risk is smaller of overlooking some aspect of simulation that JMeter may have handled automatically. For example, JMeter automatically creates more than one HTTP session, and it's much easier to implement randomized user logins instead of one account. The Java Client could do it with some extra work (not demonstrated here), but it uses just the Administrator login by default for a quick and dirty sort of load test, one focused less on the customer experience and more on server and database performance under the strain of the user requests (the method used in our sizing guidance, for instance, to see if a server is sized correctly).   The amount of time required to develop a Java Client isn't so bad for a Java developer, and when compared with learning the JMeter Framework, might be a better investment. A tool like this can handle a greater number of threads on a single testing VM; JMeter caps out around 250 threads per client on an 8Gb VM (under ideal conditions), while a Java Client can have thousands of threads easily. Likewise, a Java Client has less memory overhead than JMeter, less concern for garbage collection, and less likelihood that influence from heap memory management will affect the test results.   However, remember that everything in a Java Client has to be built from scratch and maintained over time. That means that beyond the basic tutorial here, there needs to be some kind of metrics gathering and analysis tool implemented (JMeter has built-in reporting tools), the calls need to be randomized, and not called at set intervals like they are here (which is not a very accurate representation of user load compared to a real-world scenario), and the number of users accessing the system at once should probably vary over time (to resemble peak usage hours). JMeter has a recording tool to ensure all the necessary REST requests to simulate a mashup load are made, so great care has to be taken to ensure all of the necessary REST calls for a mashup are made by the Java Client if a true simulation is called for by that approach.    Java Client Tutorial   Conclusion Neither a Java Client nor a JMeter testing suite is inherently better than the other, and both have their place within PTC's various testing processes. The best test of all is to stand up any sort of user load testing client, either of these approaches, at the same time as the UAT or QA user experience testing. QA testers who load and click about on mashups in true, user fashion can then see most accurately how the mashups will perform and what the users will experience in the Enterprise-ready, production application once the changes go out.
View full tip
Solution Central and Azure Active Directory Written by: Tori Firewind, IoT EDC   As we’ve said in a previous post, Solution Central (SC) is a crucial part of any mature dev ops pipeline. In its latest version (3.1.0), it manages custom solutions even more easily due to the SC API. However, this comes with a few requirements that can be a little tricky.   One of the more complex configuration pieces for using this new API involves a cloud-hosted Active Directory (AD) application within Azure AD. In order to make use of the new API, your organization’s users must exist on an Azure AD tenant that is separate from the PTC tenant. Most PTC customers are placed on this PTC-owned tenant by default, so additional configuration may be required to set up the AD instance within Azure before the SC API can be used. The type of tenant has to be Azure, of course, as that is what PTC uses: a multi-tenancy Azure infrastructure for all authentication.   In this usage, “tenant” is a cloud-hosting, infrastructure term which essentially refers to an Azure VM hosting an Active Directory server, as well as managing the many AD components that would otherwise require a lot of oversight. Users within that AD server are grouped so that only those solutions published by their own organizations are visible within Solution Central. In this way, the term “tenant” can also refer to a partition of some kind of data; a Solution Central tenant includes the users, the solutions, and is sort of like a sub-tenant of the larger PTC infrastructure. PTC uses a global Solution Central tenant for deployment of its own solutions, like DPM (Digital Performance Management), which is therefore available to every user in the PTC Azure AD tenant or a tenant which has been connected using the tutorial below.   There are good reasons to want to use your own Azure AD integrated into Solution Central that do not involve using the SC API. For one thing, it allows for direct control over the users that have access; otherwise, a ticket to PTC support is necessary every time a user needs access granted or removed. The tutorial below is a great reference for anyone using Solution Central, with steps 1-4 offering an easy guide for integrating your own Azure AD and simplifying your user management.   To use the SC API, however, there are additional steps required to create an application for retrieving an oauth token. This application needs the “solution-publisher” role or else bad request/forbidden errors will pop up. This role is automatically available in your Azure AD tenant once it is linked to the PTC AD tenant, but it does have to be manually assigned to the application, whose sole function is to request this access token for authenticating requests against the SC server.   The Azure application you need to create is essentially a plugin with permission to publish solutions against the SC API. It functions a little like a “login” in database terminology, serving the function of authenticating to an endpoint (in this case not tied to a user or any kind of identity). This application must be able to request an access token successfully and return that to the scripts which call upon it in order to perform the project creation and publication requests. The tutorial below steps you through how to create this application and request all of your solutions via the SC API.   Tutorial: Create Azure AD Application to Access SC via API Create a new tenant in Azure AD for users who will have access to the SC API Create a ticket with PTC to provide the tenant ID and begin the onboarding process Once that process completes, you will receive an email with a custom link to login to the Solution Central portal for your organization, where only your solutions exist Users will then need to be added as “Custom Global Administrators” on the SC enterprise application within Azure Portal to grant them access to login to Solution Central in a browser In order for us to be able to use the SC API, however, more work needs to be done; an application to request an oauth token for requests must be created in Azure Portal Navigate to “App Registrations” from the Azure AD page in Azure Portal and then click “New registration” Enter the application name (ours is called “gradle-plugin-tokenfetcher” in the examples shown here), and select the “single tenant” radio button Enter a URL for redirect URI and for “Select a platform” select “Web” Click “Register”, and wait for the application created notification Now we need to add permissions to see Solution Central Open the app from under “App registrations” and select the “API permissions” tab Click “Add a permission” and in the pop-up window that appears, select the “APIs my organization uses” tab, which should have PTC Solution Central listed, if this tenant has been linked to the PTC tenant per step 2 above Select “Application Permissions” and then check the “solution-publisher” role Click “Add permissions” The application now has access to Solution Central as a publisher, able to send solution publications over the API and not just via the Platform interface   The application is ready now, so we can create a request to test that it has access to SC Using Windows Powershell or something similar, create a script to make a couple of cURL requests against first the Azure AD and then the Solution Central API (attached as well): The tenant ID and the directory ID are the same value (listed on the tenant under “Overview”), and the client ID and application ID are the same as well (listed on the application, shown here), so don’t be confused by terminology The client app secret is the “Value” provided by the system when a new client secret is created under “Certificates & Secrets” (remember to copy this as soon as it is made and before clicking away, as after that, it will no longer be visible): The Solution Central app ID can be found under “App registrations”, listed within the details for the “solution-publisher” role: This access token request piece must be done before every request to the SC API, so this secret should be kept in some kind of password management tool (or a global environment variable in GitLab) so that it isn’t found anywhere in the source code If the script pings the SC API successfully, then a list of solutions will be returned
View full tip
The DPM User Experience Written by Tori Firewind, IoT EDC Team   As discussed in a previous post, DPM is a tool designed to be beneficial at all levels of a company, from the operators monitoring automated data on production events from the factory machines themselves, to the production supervisors who need to establish, task out, and track machine maintenance and improvement measures. DPM also engages the continuous improvement and plant leadership, by providing a standardized way to monitor performance that ultimately rolls up to the executive level. The end users of DPM are therefore diverse both in how they access DPM, and how they make use of its various features.   One of the perks to building DPM on top of the ThingWorx Foundation is that many of the webpages (called “mashups”) within ThingWorx are already responsive, and any  which aren’t responsive OOTB can be modified and custom designed for different size viewing screens to ensure that if necessary, end users can access DPM    from a variety of locations and devices. Most of the time, end users will be accessing mashups from hard-wired dashboards mounted on the actual devices,    or from wireless laptops which have standard size screens with standard resolutions. For use cases involving phones or tablets, however, it may be necessary to see how DPM will perform across a variety of bandwidth and latency conditions. Often, cellular or satellite connection is a must to facilitate field team cooperation, and 5G networks often result in worsened performance.   So, to demonstrate the influence of bandwidth and latency on the responsiveness of DPM, the Production Dashboard was loaded in the Google Chrome browser repeatedly under varying conditions. This dashboard is the webpage most operators and field users would access to log event information and production details (so it is widely used by end users). This provides a sort of benchmark of the DPM solution, something which indicates what can be expected and tells us a few things about how DPM should be deployed and configured.   Latency was introduced by hosting the servers involved in the test in different regions (all Azure cloud hosted servers, one in US East, one US West, and one in Japan East). Bandwidth was introduced using a tool on the PC with either no bandwidth or 4 megabits/second.   Browser caching was turned on and off as well, to simulate the difference between new and return users; new users would not have the webpage cached, so their load times are expected to be longer. Tomcat compression was also configured in half of the runs to demonstrate the importance of compression for optimal performance.   Each of these 24 scenarios was then tested 10 times from each location, and the actual data can be found in the attached benchmark document (a working  solution benchmark, which is not designed to be referenced directly, as matters of infrastructure may influence the exact performance of the solution).  Even with bandwidth, every region sees better performance for return users versus new users, which may be important to note. However, because DPM field users most commonly access DPM often, the return user time is a better indicator of adoption, and those numbers look great in our simulations. Notice the top line which shows the very worst of mobile performance, what happens over networks with bandwidth when Tomcat Compression is not enabled. Load times vary only slightly for regular networks when Tomcat Compression is enabled, and they vastly improve performance across regions and on mobile networks, so it is highly recommended (instructions on how to enable are below).   Key Takeaways Latency and bandwidth impact DPM performance in exactly the way one would expect of a web application. While any DPM server can be accessed from any region, regions with more latency will experience delays proportional to the amount of latency. In the chart here, find the three regions represented three times by three different colors (different from the charts above): The three different shades of each color represent the different regions Green represents the optimal configuration settings (Tomcat compression enabled, caching turned on) for returning users with bandwidth limitations (i.e. mobile networks like 5G) Blue shows first-time page visitors with no bandwidth limitations Purple shows first-time visitors that do have bandwidth limitations The uncompressed first-time load for mobile users (those with bandwidth limitations imposed) within the same region is also given to demonstrate the importance of enabling Tomcat Compression (load times only get worse without compression the farther the region) Notice how the green series has lower load times across the board than the blue one, meaning that return users even with bandwidth limitations have better performance across every region than new users. Also notice how the gap is larger between lighter colors and darker colors, where the darker the color, the farther the region from the DPM servers. This indicates that network latency has a more significant influence on performance versus bandwidth, with only longer running transactions like file uploads seeing a significant performance hit when on a network with bandwidth limitations.  Find out how to enable tomcat compression  and review the full solution benchmark in the document attached.  
View full tip
Announcing: ThingWorx Solution Central 3.1.0 and its New API Written by: Tori Firewind of the IoT EDC   Solution Central 3.1.0 ThingWorx Solution Central (SC) is the solution management tool for ThingWorx and Digital Performance Management (DPM), the latest version of which (DPM 1.1) can now be deployed directly from the PTC Solutions menu of SC. Streamlining packaging strategies and ensuring efficient solution deployment can now be done for all kinds of ThingWorx solutions, even those with heavy customization. The new API allows for updated building blocks from within the PTC Solutions menu to be easily discovered and deployed right alongside custom solutions. Even the most advanced developers can now house their deployment management process within SC.   As discussed in a previous article, Solution Central forms a necessary part of a mature DevOps pipeline, usually as a set of services within Foundation which finalize and publish the solution to the Solution Central servers. The recommendation to utilize Solution Central from within Foundation remains a best practice for ThingWorx DevOps because the vast majority of solutions benefit from using the ThingWorx APIs, which scan and check for dependencies and proper XML formatting on each included entity.   Packaging and publishing the solution from within ThingWorx is the easiest and most straightforward way recommended by PTC, but it is now possible to publish to Solution Central using a standard API for those who need to publish from Jenkins or other build jobs. If there are legacy extensions, 3rd party tool dependencies, or other customizations within the ThingWorx application, then it may be beneficial to use this new API instead.   The new API also allows for editable extensions and entities within a published solution, though PTC still recommends avoiding this as a general practice. It is usually better for purposes of maintainability and ease of upgrades to just publish the solution again (with an incremented) version each time any changes are made.   How to Use the API Within Solution Central, a new menu option has been added to review the API, with information about the different types of requests and their parameters and responses. To access this within the Solution Central UI, open the help menu and navigate to “Public APIs” (see the image on the right). To see a sample response, select a request type and scroll down to the “Responses” section (shown below). Examples of error responses are also provided, and it’s important to ensure that whatever makes these requests can properly report or log any errors for troubleshooting and maintenance of the DevOps process.   The general steps for making use of this API are as follows: Create the solution resource with a POST Add some files to that solution with PUTs First, create the solution archive with the right Solution Identifiers; this should contain at least one project XML and all of the entities belonging to that ThingWorx project Next, compute MD5 using a tool like DigestUtils on the contents of the archive; this checksum is required for Solution Central Compute the SHA hash on the archive and save it; this will need to be provided along with the archive in the PUT requests Compute MD5 on the hash file also Finally, make the two PUT requests, one for the archive and one for its hash; for example cURL requests, see the Help Center Publish the solution using a PUT So, with a little more work it is now possible to make use of SC in a more custom DevOps process. It is now possible to build JSON or XML solutions using development tools outside of ThingWorx and still publish these customizations to Solution Central. The process of DevOps Management just became more versatile, and with the ease of deployment of DPM and other PTC building blocks as well, ThingWorx is now more accessible and easy to use than ever before.
View full tip
MachNation  Podcast Replay: Enterprise-Specific Implementation Testing a podcast, by Mike Jasperson,  VP of the IoT EDC   MachNation, a company    exclusively de dicated to testing and benchmarking Internet of Things (IoT) platforms, end-to-end solutions, and services, has conducted a recent podcast series featuring our very own Mike Jasperson, Vice President of the IoT Enterprise Deployment Center here at PTC. Performance IoT    is a podcast series that brings together experts who make IoT performance testing and high-resiliency IoT part of their IoT journey. Mike Jasperson's podcast is episode 5 in the series, titled:    Enterprise-Specific Implementation Testing .  Enjoy!
View full tip
Introduction to Digital Performance Management (DPM) Written by: Tori Firewind, IoT EDC   “Digital Performance Management (DPM) is a closed-loop, problem solving solution that helps manufacturers identify, prioritize, and solve their biggest loss challenges, resulting in reduced cost, increased revenue, and improved service levels.” – DPM Help Center What is DPM? Digital Performance Manager (DPM) is an application which improves factory efficiency across a variety of different areas, namely “the four P’s” of Digital Transformation: products, processes, places, and people. Each performance issue in a factory can be mapped to at least one of these improvement categories in a new strategy for Continuous Improvement (CI) founded by PTC.   Figure 1 – Each performance issue in a factory can be mapped to at least one of 4 fundamental improvement categories: products, processes, places, and people. PTC’s new, industry-leading strategy for continuous improvement (CI) in factories is a “best practice” approach, taking the collective knowledge of many customers to form a focused, prescriptive path for success. 11 Closing the Loop Across Products, Processes, People, and Places, Manufacturing Leadership Journal   At PTC, CI in factories is driven by a “best practice”  approach , with ye ars of ex perience in manufacturing solutions combining with the collective knowledge of the many diverse use cases PTC has encountered, to generate a focused, prescriptive path for improvement in any individual factory. Figure 2 – DPM is a closed loop for continuous improvement, a strategy built around industry standard best practices and years of experience.  PTC is also defining new industry standards for OEE analysis by using time as a currency within DPM. This standardization technique improves intuitive impact assessment and allows for direct comparison of metrics (see the Help Center for details on how each metric is calculated).   DPM creates a closed loop for CI, from the monitoring phase performed both automatically and through manual operator input, to the prioritization and analyzation phases performed by plant managers. DPM helps plant managers by tracking metrics of factory performance that often go overlooked by other systems. With Analytics, DPM can also do much of the analysis automatically, finding the root causes much more rapidly. Figure 3 – All levels of the company are involved in solving the same problems effectively and efficiently with DPM. Instead of 100 people working on 100 different problems, some of which might not significantly improve OEE anyway, these same 100 people can tackle the top few problems one at a time, knocking out barriers to continuous improvement together. Production supervisors who manage the entire production line then know which less-than-effective components on the line need help. They can quickly design and redesign solutions for specific production issues. Task management within DPM helps both the production manager and the maintenance engineer to complete the improvement process. Using other PTC tools like Creo and Vuforia make the path to improvement even faster and easier, requiring less expert knowledge from the front-line workers and empowering every level of participation in the digital transformation process to make a direct, measurable impact on physical production.       How Does DPM Work? DPM as an IoT application sits on top of the ThingWorx Foundation server, a platform for IoT development that is extensible and customizable. Manufacturers therefore find they rarely have to rip and replace existing systems and assets to reap the benefits of DPM, which gathers, aggregates, and stores production data (both automatically and through manual input on the Production Dashboard), so that it can be analyzed using time as a currency. DPM also manages the process of implementing improvements (using the Action Tracker) based on the collected data, and provides an easy way to confirm that the improvements make a real difference in the overall OEE (through the Performance Analysis Dashboard). Because the analysis occurs before and after the steps to improve are taken, manufacturers can rest assured that any resources invested on the improvements aren’t done so in vain; DPM is a predictive and prescriptive analysis process.   DPM makes use of an external SQL Server to run queries against collected data and perform aggregation and analysis tasks in the background, on a separate server location than the thing model and ingestion database. This ensures that use cases involving real-time alerts and events, high-capacity ingestion, or others are still possible on the ThingWorx Foundation server.   The IoT EDC is focusing in on DPM alone for a series of  technical briefs which provide insight and expert level recommendations regarding DPM usage and configuration.  Stay tuned into the PTC Community for more updates to come.
View full tip
Distributed Timer and Scheduler Execution in a ThingWorx High Availability (HA) Cluster Written by Desheng Xu and edited by Mike Jasperson    Overview Starting with the 9.0 release, ThingWorx supports an “active-active” high availability (or HA) configuration, with multiple nodes providing redundancy in the event of hardware failures as well as horizontal scalability for workloads that can be distributed across the cluster.   In this architecture, one of the ThingWorx nodes is elected as the “singleton” (or lead) node of the cluster.  This node is responsible for managing the execution of all events triggered by timers or schedulers – they are not distributed across the cluster.   This design has proved challenging for some implementations as it presents a potential for a ThingWorx application to generate imbalanced workload if complex timers and schedulers are needed.   However, your ThingWorx applications can overcome this limitation, and still use timers and schedulers to trigger workloads that will distribute across the cluster.  This article will demonstrate both how to reproduce this imbalanced workload scenario, and the approach you can take to overcome it.   Demonstration Setup   For purposes of this demonstration, a two-node ThingWorx cluster was used, similar to the deployment diagram below:   Demonstrating Event Workload on the Singleton Node   Imagine this simple scenario: You have a list of vendors, and you need to process some logic for one of them at random every few seconds.   First, we will create a timer in ThingWorx to trigger an event – in this example, every 5 seconds.     Next, we will create a helper utility that has a task that will randomly select one of the vendors and process some logic for it – in this case, we will simply log the selected vendor in the ThingWorx ScriptLog.     Finally, we will subscribe to the timer event, and call the helper utility:     Now with that code in place, let's check where these services are being executed in the ScriptLog.     Look at the PlatformID column in the log… notice that that the Timer and the helper utility are always running on the same node – in this case Platform2 , which is the current singleton node in the cluster.   As the complexity of your helper utility increases, you can imagine how workload will become unbalanced, with the singleton node handling the bulk of this timer-driven workload in addition to the other workloads being spread across the cluster.   This workload can be distributed across multiple cluster nodes, but a little more effort is needed to make it happen.   Timers that Distribute Tasks Across Multiple ThingWorx HA Cluster Nodes   This time let’s update our subscription code – using the PostJSON service from the ContentLoader entity to send the service requests to the cluster entry point instead of running them locally.       const headers = { "Content-Type": "application/json", "Accept": "application/json", "appKey": "INSERT-YOUR-APPKEY-HERE" }; const url = "https://testcluster.edc.ptc.io/Thingworx/Things/DistributeTaskDemo_HelperThing/services/TimerBackend_Service"; let result = Resources["ContentLoaderFunctions"].PostJSON({ proxyScheme: undefined /* STRING */, headers: headers /* JSON */, ignoreSSLErrors: undefined /* BOOLEAN */, useNTLM: undefined /* BOOLEAN */, workstation: undefined /* STRING */, useProxy: undefined /* BOOLEAN */, withCookies: undefined /* BOOLEAN */, proxyHost: undefined /* STRING */, url: url /* STRING */, content: {} /* JSON */, timeout: undefined /* NUMBER */, proxyPort: undefined /* INTEGER */, password: undefined /* STRING */, domain: undefined /* STRING */, username: undefined /* STRING */ });   Note that the URL used in this example - https://testcluster.edc.ptc.io/Thingworx - is the entry point of the ThingWorx cluster.  Replace this value to match with your cluster’s entry point if you want to duplicate this in your own cluster.   Now, let's check the result again.   Notice that the helper utility TimerBackend_Service is now running on both cluster nodes, Platform1 and Platform2.   Is this Magic?  No!  What is Happening Here?   The timer or scheduler itself is still being executed on the singleton node, but now instead of the triggering the helper utility locally, the PostJSON  service call from the subscription is being routed back to the cluster entry point – the load balancer.  As a result, the request is routed (usually round-robin) to any available cluster nodes that are behind the load balancer and reporting as healthy.   Usually, the load balancer will be configured to have a cookie-based affinity - the load balancer will route the request to the node that has the same cookie value as the request.  Since this PostJSON  service call is a RESTful call, any cookie value associated with the response will not be attached to the next request.  As a result, the cookie-based affinity will not impact the round-robin routing in this case.   Considerations to Use this Approach   Authentication: As illustrated in the demo, make sure to use an Application Key with an appropriate user assigned in the header. You could alternatively use username/password or a token to authenticate the request, but this could be less ideal from a security perspective.   App Deployment: The hostname in the URL must match the hostname of the cluster entry point.  As the URL of your implementation is now part of your code, if deploy this code from one ThingWorx instance to another, you would need to modify the hostname/port/protocol in the URL.   Consider creating a variable in the helper utility which holds the hostname/port/protocol value, making it easier to modify during deployment.   Firewall Rules: If your load balancer has firewall rules which limit the traffic to specific known IP addresses, you will need to determine which IP addresses will be used when a service is invoked from each of the ThingWorx cluster nodes, and then configure the load balancer to allow the traffic from each of these public IP address.   Alternatively, you could configure an internal IP address endpoint for the load balancer and use the local /etc/hosts name resolution of each ThingWorx node to point to the internal load balancer IP, or register this internal IP in an internal DNS as the cluster entry point.
View full tip
ThingWorx DevOps with Azure: The Comprehensive DevOps Guide Written by Tori Firewind, IoT EDC   As promised in a previous post,  attached here is a comprehensive guide to DevOps in ThingWorx, including tutorials and instructions for creating a continuous integration, continuous deployment (CI/CD) process for application development.  There are also updated scripts and entities attached, including an entire sample application for importing, exporting, and testing an application in ThingWorx. From Docker and Github to Azure DevOps and Solution Central, this guide has it all. Learn how to perform your role in the DevOps process whether an administrator or a developer, automate your deployments and testing, and create a more efficient process  for publication changes to production.  A complete DevOps process like this really does facilitate faster and easier updates with fewer risks, fewer delays, and a better pathway to success.  
View full tip
Persistent vs. Logged Properties By Mike Jasperson, VP of IoT EDC   Executive Summary ThingWorx provides several different “aspects” (or storage options) for how property values are saved.  These options each have different implications for performance and scalability.  Understanding those implications is important for designing a scalable IOT solution.   Persistent Properties are best used for non-telemetry data which will change infrequently (for example only a few times in a day) and where historical values are not required.  When overused, Persistent properties can put significant pressure on the database layer of your ThingWorx implementation, leading to poor performance of your IOT application.  As the number of Things in your IOT application scales up, the quantity or frequency of persistent properties per Thing needs to be carefully considered.   Logged Properties are best used for telemetry data where historical values need to be retained, but also for any other value that is expected to change frequently.  Logged properties can create some additional requirements: a process for handling null/default values after restarts, more disk space, and a data retention policy. There are benefits as well, though, like more flexibility and scalability for the ingestion of larger volumes of data.   Persistent + Logged Properties perform database operations of both aspects.  Combined use should be very limited – only properties that update infrequently (a few times a day), and that must be in-memory in the event of a ThingWorx restart.   In-Memory Only Properties are neither persistent nor logged – they are not stored to the database.  These properties can greatly improve scale for values that need to be available for the application to drive UIs or compute other derived values that will be stored.  However, high-frequency updates of in-memory properties can create scale challenges in HA (high availability) ThingWorx configurations where memory state needs to be constantly shared between multiple ThingWorx nodes.     Find a complete summary as well as example cases in the document attached.
View full tip
ThingWorx DevOps By Victoria Firewind,  IoT EDC This presentation accompanies a recent Expert Session, with video content including demos of the following topics:   found here!   DevOps is a process for taking planned changes through development, through testing, and into production,   where they can be accessed by end users.   One test instance typically has automated tests (integration testing) which ensure application logic is preserved in spite of whatever changes the developers are making, and often there is another test instance to ensure the application is usable (UAT testing) and able   to handle a production load (load testing).   So, a DevOps Pipeline starts with a task manager tasking out planned changes, where each task will become a branch in the repository. Each time a new branch is created, a new pipe is needed, which in this case, is produced by Docker Hub.   Developers then make changes within that pipe, which then flow along the pipe into testing. In this diagram, testing is shown as the valve which when open (i.e. when tests all pass) then   allows the changes to flow along the pipe into production.   A good DevOps process has good flow along the various pipelines, with as much automated or scripted as possible to reduce the chances for errors in deployments.   In order to create a seamless pipeline, whether or not it winds up automated, several third party tools are useful:       b                  Container software is a very good way to improve the maintainability and updatability of a ThingWorx instance, while minimizing the amount of resources needed to host each component.   n  1. Create Docker Image Consult the Help Center if need be. Update your YML file with everything you need before starting the image: see the example in the PTC community.   License the instance using the license management website. Follow the instructions from Docker for installing those tools: Docker itself (docker) and Docker Compose (docker-compose).   n   2. Save Docker Image in Docker Repo Docker Hub has some free options, and if a license is purchased,   can host more than a single Docker image and tag. It is also possible to set up your own Docker registry.           n 3. Access the image in Docker Desktop Download Docker Desktop and sign-in to the Docker account which hosts the repository.   Create some folders for storing the h2.env file and the ThingworxStorage and ThingWorxPlatform mounted folders.   Remember to license these containers as well. Developers login to the license management site themselves and put those into the ThingWorxPlatform mounted folder (“license_capability_response.bin”).         Git is a very versatile tool that can be used through many different mediums, like Azure DevOps or Github Desktop.  To get started as a totally new Git user, try downloading Github Desktop on your local machine and create a local repository with the provided sample code.    This can then be cloned on a Linux machine, presumably whichever instance hosts the integration ThingWorx instance, using the provided scripts (once they are configured).  Remember to install Git on the Linux machine, if necessary (sudo apt-get install git).     A sample ThingWorx application (which is not officially supported, and provided just as an example on how to do DevOps related tasks in ThingWorx) is attached to this post in a zip file, containing two directories, one for scripts and one for ThingWorx entities.    Copy the Git scripts and config file into the top level, above    the repository folder, and update the GitConfig.sh file with the URL for your Git repository and your login credentials. Then these scripts can be used to sync your Linux server with your Git repository, which any developer could easily update from their local machine. This also ensures changes are secure, and enables the potential use of other DevOps procedures like tasks, epics, and corresponding branches of code.     Step s to DevOps using the provided code as an example: Clone the repository into the SystemRepository or any other created repository, use the provided scripts in a Linux environment. Import the DeploymentUtilities entity, which again is scripted for Linux or for use with a development IDE with bash support. Then import the ThingWorx application from source control or use the script (which itself makes use of that DeploymentUtilities entity). Now create some local changes, add things, etc. and try out the UpdateApplication script or export to source control and then push to the Git repo. Data and localization table exports are also possible. Run the tests using the provided IntegrationTester thing or create your own by overriding the IntegrationTestTS thing shape, or use the TestTwxApplication script from a Linux terminal. Design a process for your application which  allows for easy application exports and updates to and from a repository, so that developers can easily send in their changes, which can then be easily loaded and tested in another environment.   In Conclusion: DevOps is a complex topic and every PTC customer will have their own process based around their unique requirements and applications. In the future, more mature pipeline solutions will be covered, ones that involve also publishing to Solution Central for easier deployment between various testing instances and production.        
View full tip
Is your team operating an effective DevOps pipeline? DevOps is an important part of a mature, enterprise ready application, but the process isn’t simple.   This expert session will focus on how a full DevOps pipeline looks like and how PTC can help to build a seamless pipeline. Join us for our upcoming Expert Session to learn how to create a Docker image, integrate Azure with Docker and Git, and set up a seamless DevOps pipeline.   When? Thursday, September 30th 2021 | 11 AM EST Host: Tori FIrewind, Senior Engineer in PTC  IOT Enterprise Deployment Center Registration link: https://www.ptc.com/en/resources/iiot/webcast/devops-pipeline-thingworx 
View full tip
Thundering Herd Scenarios in ThingWorx Written by Jim Klink, Edited by Tori Firewind   Introduction The thundering herd topic is quite vast, but it can be broken down into two main categories: the “data flood” and the “reconnect storm”. One category involves what happens to the business login (the “data flood” scenario) and affects both Factory and Connected Products use cases. The other category involves bringing many, many devices back online in a short time (the “reconnect storm” scenario), which largely influences Connected Products scenarios.   Citation: https://gfp.sd.gov/buffalo-roundup/ Think of Connected Products as a thundering stampede of many small buffalo, which then makes a Factory thundering herd scenario a stampede of a couple massive brontosaurus, much fewer in number, but still with lots of persisted data to send back in. This article focuses in on how to manage the “reconnect storm” scenario, by delaying the return of individual buffalo to reduce the intensity of the stampede. Find here the necessary insights on how to configure your ThingWorx edge ap plications to minimize the e ffect of a server down scenario.    The C-SDK will be used for examples , but the general principles will apply to any of the ThingWorx edge options (EMS, .Net SDK, Java SDK).  This article also references the ExampleAgent application which is built using the C-SDK . The ExampleAgent is available for download as an attachment to this post.  It offers an easily configurable edge solution for Windows and Linux that can be used for the following purposes: Foundation for rapid development of a robust custom edge application based on the ThingWorx C-SDK for use by customers and partners. Full featured, well documented, ‘C’ source code example of developing an application using the ThingWorx C-SDK. A “local” issue is one which affects a single agent, a loss of connectivity due to hardware malfunction or local network issues. Local issues are quite common in the IoT world, and recovery usually isn’t too much of a challenge. A “global” issue occurs when many agents disconnect simultaneously, usually because there is an issue with the ThingWorx server itself (though the Load Balancer, Connection Server, or web hosting software could also be the source). Perhaps it is a scheduled software update, perhaps it is unexpected downtime due to issues, but either way, it’s important to consider how the fleet of agents will respond if ThingWorx suddenly becomes unavailable.   There are two broad issues to consider in a situation like this. One is maintaining the agent’s data so that it can be sent when the connection becomes available again. This can be done in the C-SDK using an offline file storage system, which includes properties, events, and services. Offline storage is configured in the twConfig.h file in the C SDK.  The second issue the number of Agents seeking to reconnect to the server in a short period of time when the server is available.    Of course, if revenue is based on uptime, perhaps persisting data is less critical and can be lost, making things simple. However, in most cases, this data will need to be stored on the edge device until reconnect. Then, once the server comes back up, suddenly all of this data comes streaming in from all of the many edge devices simultaneously.   This flood of both dat a and reconnection of a multitude of agents can create what is called a “thundering herd” scenario, in which ThingWorx can become backlogged with data processing requests, data can be lost if the queues are overwhelmed, or worst-case, the Foundation server can become unresponsive once again. This is when outages become costly and drag on longer than necessary. Several factors can lead to a thundering herd scenario, including the number of agents in the fleet, the amount of stored data per agent, the amount of data ordinarily sent by these devices, which is sent side-by-side with the stored data upon reconnection, and how much processing occurs once all of this data is received on the Foundation server.   The easiest way to mitigate a potential thundering herd scenario, and this is considered a ThingWorx best practice as well, is to randomize the reconnection of devices. Each agent can be configured to delay itself by a random amount of time before attempting to reconnect after a loss of connectivity. This random delay then distributes the number of assets connecting at a time over a longer period, thus minimizing the impact of the reconnections on ThingWorx. There are several configuration settings that help in this regard.   Configuring the Herd (C-SDK) The C-SDK is great at managing agent connectivity, having a lot of options for fine-tuning the connections. The web-socket connection is managed by the SDK layer of the edge device (which also manages the retry process). To review the source code for how connections are made, see the C-SDK file found here: src\api\twApi.c, specifically the function called twApi_Connect().   The ExampleAgent uses custom configuration files to manage this process from the application layer, a more robust and complete solution. Detailed here are the configuration options in the ExampleAgent attached to this post, most of which can be found in its ws_connection.json configuration file: connect_timeout is used throughout the C-SDK as the time to wait for a web-socket connection to be established (i.e. the ‘timeout’ value). This is the maximum delay for the socket to be established or to send and receive data. If it is established sooner, then a success code is returned. If a connection is not established in the configured timeout period, then an error is returned. Setting this value to 10 seconds is reasonable, for reference. connect_retries is the number of times the SDK will attempt to establish a connection before the twApi_Connect() function returns an error. Setting this to -1 will trigger the SDK to stay in the loop infinitely until a connection is established. connect_retry_interval is the delay between connection retries. max_connect_delay is used as a delay before even entering the loop, that which uses the connect_retries and connect_retry_interval parameters to establish the connection. The SDK function twAddConnectionDelay() is called, which delays by a random amount of time between 0 seconds and the value given by this parameter. This random delay is only used once per call to twApi_Connect().  This is therefore the parameter most critical to preventing thundering herd scenarios (as discussed above). Configuring the SDK agents to reconnect in this way is critical, but there are also some drawbacks, namely that while the twApi_Connect() function is running, there is no clean way of shutting the agent down. Likewise, the agent only does ONE randomized delay per call of the twApi_Connect() function, meaning that if reconnection cannot occur immediately, it’s still possible for many agents to try to reconnect at once. Consider this when determining what values to assign to these parameters.   ExampleAgent Design The ExampleAgent provided here is a fully implemented, configurable application, like the EMS in terms of functionality, but containing only simulated data. The data capture component is missing here and has to be custom developed. Attached alongside this source code is extensive documentation that explains how to get the application set up and configured. This isn’t meant to be used directly in a production environment.   Please note that the ExampleAgent is provided as-is; it is not an officially released product by PTC.   This disclaimer includes the ExampleAgent source code, build process, documentation, deliverables as well as any ExampleAgent modifications to the official releases of the C-SDK or the SCM extension product. Full and sole responsibility for the use, deployment, reliability, and accuracy of any ExampleAgent related code, documentation, etc. falls to the user, and any use of the ExampleAgent is an implicit agreement with this disclaimer.   The ExampleAgent was developed by PTC sales and services to help in the Edge application development process.  For assistance, support, or additional development, an authorized statement of work is needed.  Please Note:  PTC support is not aware of the existence of the ExampleAgent and cannot provide assistance.    Because of the small downside to configuring the twApi_Connect() function directly as discussed above, there is alternative approach given here as well. The ExampleAgent module ConnectionMgr.c controls the calling of the twApi_Connect() on a dedicated connection thread. The ConnectionMgrThreadFunction() contains the source code necessary to understanding this process.   The ConnectionMgr.c workflow and source code visualization via Microsoft Visual Studio are in the diagrams below. The ExampleAgent defines its own randomized delay to mitigate the thundering herd scenario while still deploying an edge system that responds to shut down requests cleanly. In this case, the randomized delay is configured by the parameter reconnect_random_delay_seconds in the agent_config.json file. Since the ConnectionMgrThreadFunction() controls the calling of twApi_Connect(), the ConnectionMgrThreadFunction() will delay the randomized value EVERY time before calling this reconnect function. A separate thread is created to call the reconnect function so that there are still resources available for data processing and to check for shutdown signals and other conditions.   Recommended Values These recommendations are based around managing the reconnection process from the application layer. These may be different if the C-SDK is configured directly, but creating application layer management is recommended and provided in the ExampleAgent attached. The ExampleAgent is configured by default to simplify the SDK layer’s involvement.   These configuration options tell the SDK layer to try to connect just once, after just 1 second: There is no official recommendation for the above values due to the fact that every use case is different and will require different fine tuning to work well.   Then this setting here handles the retry process from within the application layer of the ExampleAgent: Conclusion To reduce the chances of a thundering herd scenario, configure the fleet to reconnect after differing random delays. The larger the random delay times, the longer it takes for the fleet to come back online and fleet data to be received. While more complex ThingWorx deployment architectures (such as container-based deployments like Kubernetes or Thingworx High Availability (HA) clusters) can also help to address the increased peak load during a thundering herd event, randomized reconnect delays can still be an effective tool.        
View full tip
ThingWorx Docker Overview and Pitfalls to Avoid    by Tori Firewind of the IoT EDC Containers are isolated and can run side-by-side on the same machine, but they share the host OS, making them more efficient in terms of memory usage and scalability.   Docker is a great tool for deploying ThingWorx instances because eve rything is pre-packaged within the Docker image and can be stored in a repository ready for deployment at any time with little configuration required.  By using a different container for every component of an application, conflicting dependencies can be avoided. Containers also facilitate the dev ops process, providing consistent application deployments which can be set up, taken down, and tested automatically using scripts.   Using containers is advantageous for many reasons: simplified configuration, easier dev ops management, continuous integration and deployment, cost savings, decreased delivery time for new application versions, and many versions of an application running side-by-side without any wasted resources setting them up or tearing them down. The ThingWorx Help Center is a great resource for setting up Docker and obtaining the ThingWorx Docker files from the PTC Software Downloads website. The files provided by PTC handle the creation of the image entirely, simplifying the process immensely. All one has to do is place the ThingWorx version and all of the required dependencies in the staging folder, configure the YML file, and run the build scripts. The Help Center has all of the detailed information required, but there are a few things worth noting here about the configuration process.   For one thing, the platform-settings.json file is generated based on the options given in the YML file, so configuration changes made within this configuration file will not persist if the same options aren’t given in the YML file. If using Docker Desktop to run an image on a Windows machine, then the configuration options must be given in an ENV file that can be referenced from the command used to start the image. The names of the configuration parameters differ from the platform-settings.json file in ways that are not always obvious, and a full list can be found here.   For example, if extension imports need to be enabled on a ThingWorx instance running in Docker, then the EXTPKG_IMPORT_POLICY_ENABLED option must be added to the environment section of the YML file like this:     environment: - "CATALINA_OPTS=-Xms2g -Xmx4g" # NOTE: TWX_DATABASE_USERNAME and TWX_DATABASE_PASSWORD for H2 platform must # be set to create the initial database, or connect to a previous instance. - "TWX_DATABASE_USERNAME=dbadmin" - "TWX_DATABASE_PASSWORD=dbadmin" - "EXTPKG_IMPORT_POLICY_ENABLED=true" - "EXTPKG_IMPORT_POLICY_ALLOW_JARRES=true" - "EXTPKG_IMPORT_POLICY_ALLOW_JSRES=true" - "EXTPKG_IMPORT_POLICY_ALLOW_CSSRES=true" - "EXTPKG_IMPORT_POLICY_ALLOW_JSONRES=true" - "EXTPKG_IMPORT_POLICY_ALLOW_WEBAPPRES=true" - "EXTPKG_IMPORT_POLICY_ALLOW_ENTITIES=true" - "EXTPKG_IMPORT_POLICY_ALLOW_EXTENTITIES=true" - "EXTPKG_IMPORT_POLICY_HA_COMPATIBILITY_LEVEL=WARN" - "DOCKER_DEBUG=true" - "THINGWORX_INITIAL_ADMIN_PASSWORD=Pleasechangemenow"   Note that if the container is started and then stopped in order for changes to the YML file to be made, the license file will need to be renamed from "successful_license_capability_response.bin" to "license_capability_response.bin" so that the Foundation server can rename it. Failing to rename this file may cause an error to appear in the Application Log, and the server to act as if no license was ever installed: "Error reading license feature info for twx_realtime_data_sub".   In Docker Desktop on a Windows machine, create a file called whatever.env and list the parameters as shown here: Then, reference this environment file when bringing up the machine using the following command in Powershell:      docker run -d --env-file h2.env -p 8080:8080 -v ${pwd}/ThingworxPlatform:/ThingworxPlatform -v ${pwd}/ThingworxStorage:/ThingworxStorage -it <image_id>     Notice in this command that the volumes for the ThingworxPlatform and ThingworxStorage folders are specified with the “-v” options. When building the Docker image in Linux, these are given in the YML file under the volumes section like this (only change the path to local mount on the left side of the colon, as the container mount on the right side will never change):      volumes: - ./ThingworxPlatform:/ThingworxPlatform - ./ThingworxStorage:/ThingworxStorage - ./tomcat-logs:/opt/apache-tomcat/logs     Specifying the volumes this way allows for ThingWorx logs and configuration files to be accessed directly, a crucial requirement to debugging any issues within the Foundation instance. These volumes must be mapped to existing folders (which have write permissions of course) so that if the instance won’t come up or there are any other issues which require help from Tech Support, the logs can be copied out and shared. Otherwise, the Docker container is like a black box which obscures what is really going on. There may not be any errors in the Docker logs; the container may just quit without error with no sign of why it won’t stay up. Checking the ThingWorx and Tomcat logs is necessary to debugging, so be sure to map these volumes correctly.   Once these volumes are mapped and ThingWorx is successfully making use of them, adding a license file to the Docker instance is simple. Use the output in the ThingworxPlatform folder to obtain the device ID, grab a valid license file, and put it right back into that ThingworxPlatform folder, exactly the same way as on a regular instance of ThingWorx. However, if the Docker image is being used for a dev ops process, a license may not be necessary. The ThingWorx instance will work and allow development for a time before the trial license expires, which normally will be enough time for developers to make their changes, push those changes to a repository, and tear the container down.   Another thing worth noting about ThingWorx Docker image creation is that the version of Java supplied in the staging folder must match the compatibility requirements for each version of ThingWorx. This is the version of Java used by the container to run the Foundation server. In versions of ThingWorx 9.2+, this means using the Amazon Corretto version of Java. The image absolutely will not start ThingWorx successfully if older versions of Java are used, even if the scripts do successfully build the image.   Also note that in the newer versions of ThingWorx Docker, the ThingWorx Foundation version within the build.env file is used throughout the Docker image creation process. Therefore, while the archive name can be hard-coded to whatever is desired, the version should be left as is, including any additional specifications beyond just the version number. For example, the name of the archive can be given as Thingworx-Platform-H2-9.2.0.zip (a prettier version of the archive name than is used by default), but the PLATFORM_VERSION should still be set to 9.2.0-b30 (which should be how it appears within the build.env file upon download of the ThingWorx Docker files).   Paying attention to every note in the Help Center is critically important to using ThingWorx Docker, as the process is extensive and can become very complicated depending on how the image will be used. However, as long as the volumes are specified and the log files accessible, debugging any issues while bringing up a Docker-contained ThingWorx instance is fairly straightforward.     Credits: Images borrowed from ThingWorx Docker Containerization Tech Talk by Adrian Petrescu
View full tip
5 Common Mistakes to Developing Scalable IoT Applications by Tori Firewind and the IoT EDC Team Introduction To build scalable applications, it’s necessary to identify common mistakes and avoid them at the early stages of development. In an expert session this past month, the PTC Enterprise Deployment Team elaborated on why scalability is important and how to avoid the common development pitfalls in IoT. That video presentation has been adapted here for visual consumption of the content as well.   What is Scalability and Why Does it Matter Enterprise ready applications can scale and easily be maintained, which is important even from day 1 because scalability concerns are the largest cause for delays to Go Lives.  Applications balance many competing requirements, and performance testing is crucial to ensure an application is ready for Go Live. However, don't just test how many remote assets can connect at once, but also any metrics that are expected to increase in time, like the number of remote properties per thing, the frequency of reporting from those properties, or the number of users accessing the system at once. Also consider how connecting more assets will affect the user experience and business logic, and not just the ability to ingest data.   Common Mistake 1: Edge Property Updates Because ThingWorx is always listening for updates pushed from the Edge and those resources are always in use, pulling updates from the Foundation side wastes resources. Fetch from remote every read is essentially a round trip, so it's slower and more memory intensive, but there are reasons to do it, like if the quality tag is needed since the cache doesn't store it. Say a property is pushed at 11:01, and then there's a network issue at 11:02. If the property is pulled from the cache, it will pull the value sent at 11:01 without any indication of there being a more recent value on the Edge device. Most people will use the default options here: read from server cache, which relies on the Edge to push updates, and the VALUE push type, and configuring a threshold is a good idea as well. This way, only those property updates which are truly necessary are sent to the Foundation server. Details on property aspects can be found in KCS Article 252792.   This is well documented in another PTC Community post. This approach is necessary and considered a best practice if there is event logic which depends on multiple properties at once. Sending all of the necessary properties to determine if an event should fire in one Infotable ensures there is no need to query the database each time a property update comes in from the Edge, which ensures independent business logic and reduces the load on the database to improve ingestion performance. This is a very broad topic and future articles will address it more specifically. The When Disconnected property aspect is a good way to configure what happens with Edge property values in a mass disconnect scenario. If revenue depends on uptime, consider losing any data that changes while a device is disconnected. All of the updates can be folded into a single value if the changes themselves aren't needed but an updated value is needed to populate remote properties upon reconnect. Many customers will want to keep all of their data, even when a device is offline and use data stores. In this case, consider how much data each Edge device can store (due to memory limitations on the devices themselves), and therefore how long an outage can last before data is lost anyway. Also consider if Foundation can handle massive spikes in activity when this data comes streaming in. Usually, a Connection Server isn't enough. Remember that the more data needs to be kept, the greater the potential for a thundering herd scenario.   Handling a thundering herd scenario goes beyond sizing considerations. It is absolutely crucial to randomize the delay each device will wait before attempting to reconnect. It should be considered a requirement to have the devices connect slowly and "ramp up" over time for multiple reasons. One is that too much data coming in too fast could overwhelm the ingestion queue and result in data loss. Another is that the business logic could demand so many system resources, that the Foundation server crashes again and again and cannot be recovered. Turning off the business logic it isn't possible if the downtime is unexpected, so definitely rely instead on randomized reconnection times for Edge devices.   Common Mistake 2: Overlooking Differences in HA To accommodate a shared thing model across many servers, changes had to be made in how the thing model is stored and the model tree is walked by the Foundation servers. Model information is no longer cached at the Thing level, and the model tree is therefore walked every time model information is needed, so the number of times a Thing is directly referenced within each service should be limited (see the Help Center for details).   It's best to store whatever information is needed from a Thing in an Infotable, making the Things[thingName] reference a single time, outside of any loops. Storing the property definitions outside of the loop prevents the repetitious Thing references within the service, which otherwise would have occurred twice for each property (for both the name and the description), and then again for every single property on the Thing, a runtime nightmare.   Certain states previously held in memory are now shared across the cluster, like property values, Thing states, and connection statuses. Improvements have been made to minimize the effects of latency on queries, like how they now only return property values on associated Thing Shapes or Thing Templates. Filtering for properties on implementing Things is still possible, but now there is a specific service to do it, called GetThingPropertyValues (covered in detail in the Help Center).    In the script shown above, the first step is a query to get the names of all implementing things of a particular Thing Shape. This is done outside of any loops or queries, so once per service call. Then, an Infotable is built to store what would have been a direct reference to each thing in a traditional loop. This is a very quick loop that doesn't add much by way of runtime since it is all in memory, with no references to the thing model or the database, instead using the results of the first query to build the Infotable. Finally, this thing reference Infotable is passed into the new service GetThingPropertyValues to retrieve all of the property info for all of these things at once, thereby only walking the thing model once. The easiest mistake people would make here is to do a direct thing reference inside of a loop, using code like Things[thingName].Get() over and over again, thereby traversing the thing model repeatedly and adding a lot of runtime. QueryImplementingThingsOptimized is another new service with new parameters for advanced configuration. Searches can now be done on particular networks or to particular depths, and there's an offset parameter that allows for a maximum number of items to be returned starting at any place in the list of Things, where previously if you needed the Things at the end of the list, you had to return all of the Things. All of these options are detailed in the Help Center, as well as the restrictions listed in the image above.    Common Mistake 3: Async Service Misuse   Async services are sometimes required, say if a user has to trigger many updates on many remote things at once by the click of a button on a mashup that should not be locked up waiting for service completion. Too many async service calls, though, result in spikes in activity and competition for resources. To avoid this mistake, do not use async unless strictly necessary, and avoid launching too many async threads in parallel. A thread dump will show how many threads there are and what they are doing.   Common Mistake 4: Thread Pool Overload Adding more threads to the pool may be beneficial in certain circumstances, like if the threads are waiting on other resources to complete their tasks, look stuff up in the database (I/O), or unlock data that can only be accessed one thread at a time (property writes). In this case, threads are waiting on other resources, and not the CPU, so adding more threads to the pool can improve performance. However, too many threads and performance degradation will occur due to increased contention, wasted CPU cycles, and context switches.   To check if there are too many or not enough threads in the pool, take thread dumps and time the completion of requests in the system. Also watch the subsystem memory usage, and note that the side of the queue should never approach the max. Also consider monitoring the overall performance of the system (CPU and Memory) with a tool like Grafana, and remember that a good performance test properly exercises all of the business logic and induces threads in a similar way to real world expectations.   Common Mistake 5: Stream Etiquette Upserts, or updates to database tables, are expensive operations that can interfere with ingestion if they are performed on the wrong tables. This is why Value Stream and Stream data should never be updated by end users of the application. As described in the DGIS document on best practices, aggregation is the key to unlocking optimal performance because it reduces the size of database tables that require upserts. Each data structure shown here has an optimal use in a well-designed ThingWorx application.   Data Tables are great for storing overview information on all of the Things in one view, and queries on this data source are the fastest. Update this data source as often as possible (by timer), allowing enough time for updates to be gathered and any necessary calculations made. Data Tables can also be updated by end users directly because each row locks one at a time during updates. Data Tables should be kept as small as possible to improve performance on mashups, so for instance, consider using one to show all Things per region if there are millions of Things. Roll up information is best stored here to avoid calculations upon mashup load, and while a real-time view of many thousands of things at once is practically impossible, this option allows for a frequently updates overview of many things, which can also drill down to other mashup views that are real-time for one Thing at a time.   Value Streams are best used for data ingestion, and queries to these should be kept to a minimum, largely performed by the roll up logic that populates the Data Tables mentioned above. Queries that chart all of the data coming in are best utilized on individual Thing views so that only a handful of users are querying the same data sources at a time. Also be sure to use start and end dates and make use of the "source" field to improve query performance and create a better user experience. Due to the massive size of the corresponding database tables, it's best to avoid updating Value Streams outside of the data ingestion process altogether.   Streams are similar, but better for storing aggregated, historical data. Usually once per day or per week (outside of business hours if possible), Value Stream data will be smoothed or reduced into less data points and then stored into Streams. This allows for data to be stored for longer periods of time on the server without using up as much memory or hurting query performance. Then the high volume ingested data sources can be purged frequently, as discussed below.   Infotables are the most memory intensive, and are really designed to hold only a small number of rows at a time, usually to facilitate the business logic. Sometimes they will be stored in Streams or Data Tables if they aren't expected to grow larger (see the DGIS Coffee Machine App for an example). Infotables should never be logged; if they are used to transmit Edge property updates (like in the Property Set Approach), they should be processed into other logged (usually local) properties.   Referring to the properties themselves is how to get real-time information on a mashup, say by using the GetProperties service and its auto-update option, which relies on internal websockets. This should be done on individual Thing views only, and sizing considerations need to be made if there will be many of these websockets open at once, say if there are many end users all viewing real-time data at a time.   In the newer versions of ThingWorx, these cannot be updated directly, so find the system object called ThingWorxPersistenceProvider and use the service UpdateStreamDataProcessingSettings. ThingWorx Foundation processes data received from remote devices in batches in order to manage the data flow and reduce database churn. All of these settings configure how large those batches are and how frequently they are flushed to the database (detailed in full in KCS Article 240607). This is very advanced configuration that heavily depends on use case and infrastructure, but some info applies to most people: adjusting the scan rate is usually not beneficial; a healthy queue should never approach the max limit; and defaults differ by database because they function differently. InfluxDB generally works better when there are less processing threads and higher numbers of things per thread, while PostgresDB can have a lot of threads, preferably with less things per thread. That's why the default values shown here are given as the same number of threads (and this can be changed), but Influx has a larger block size and size threshold because it can handle more items per thread. Value Streams ingest all data into the Foundation server, and so the database tables that correspond with these data sources grow very large, very quickly and need to be purged often and outside of business hours, usually once a day or once per week. That's why it's important to reduce the data down to less points and push them into Streams for historical reference. For a span of years, consider a single point a day might be enough, for a span of hours, consider a data point a minute. Push aggregated data into Streams and then purge the rest as soon as it is no longer needed.   In Conclusion
View full tip
Unlocking the Power of Industrial Data Presentation by Mike Jasperson, VP of the IoT Enterprise Deployment Center   his video presentation was performed at the Digital Transformations in Manufacturing conference of 2021, hosted by Enterprise Digital. In this presentation, Mike Jasperson goes over the benefits to modernizing and consolidating access to time-stamped data that is ingested from equipment and sensors into a central location like ThingWorx. Moving away from monolithic, legacy, and siloed systems, and towards more agile solutions, has never been more critical in order to increase machine, operational, and business efficiencies while also opening up visibility into data systems and infrastructure deployments.   This video partners with InfluxData to help customers extract value from IoT data systems, maximizing both performance and operational capabilities of their monitoring systems. To stay competitive in the IoT market, it's important to review the best practices for scaling and testing your industrial metrics solutions, as well as how to get the best performance out of your digital data solutions by using time-series optimized databases like InfluxDB. Open source technologies discussed here are a great way to create modular and upgradable solutions and accelerate IoT innovation.     (view in My Videos)
View full tip
Interested in learning how others using and/or hosting ThingWorx solutions can comply with various regulatory and compliance frameworks?   Based on inquiries regarding the ability of customers to meet a wide range of obligations – ranging from SOC 2 to ISO 27001 to the Department of Defense’s Cybersecurity Maturity Model Certification (CMMC) – the PTC's IoT Product Management and EDC teams have collaborated on a set of detailed articles explaining how to do so.   Please check out the   ThingWorx Compliance Hub   (support.ptc.com login required) for more information!
View full tip
The New and Improved DGIS Guide to ThingWorx Development Written by Victoria Firewind of the IoT EDC   The classic Developing Great IoT Solutions guide has been reskinned and revamped for newer versions of ThingWorx! The same information on how to build a quality IoT application is now available for versions of ThingWorx 9.1+, and now, a complete sample application is included to demonstrate these ideas.    Find within the attached archive a PDF with high-level overview information on development and application design geared towards managers and business users, so that everyone can understand the necessary requirements, common terms, and key tips on how to ensure an application is scalable and maintainable right from the very start. Reduce your chances of running into issues between PoC and Go Live by reviewing this information today!   Also find within this PDF a series of tutorials which teach not just how to use the ThingWorx software, but which also educate on how to make good application design choices. A basic rules engine for sending real-time notifications is included here, as well as a complete demo application which illustrates each concept in a real-world use case. This Coffee Machine Demo App relies upon the tutorial entities, which can also now be imported directly using the other XML files provided here. This ensures that anyone can review these concepts, regardless of how much time one can commit or how much knowledge one already has on the subject.   This is a complex guide, and any issues, questions, or bugs found within can be reported right here on this thread. Happy developing from the IoT EDC!
View full tip
By Tim Atwood and Dave Bernbeck, Edited by Tori Firewind Adapted from the March 2021 Expert Session Produced by the IoT Enterprise Deployment Center The primary purpose of monitoring is to determine when your application may be exhausting the available  resources. Knowledge of the infrastructure limits help establish these monitoring boundaries, determining  straightforward thresholds that indicate an app has gone too far. The four main areas to monitor in this way are CPU, Memory, Networking, and Disk.   For the CPU, we want t o know how many cores are available to the application and potentially what the temperature is for each or other indicators of overtaxation. For Memory, we want to know how much RAM is available for the application. For Networking, we want to know the network throughput, the available bandwidth, and how capable the network cards are in general. For Disk, we keep track of the read and write rates of the disks used by the application as well as how much space remains on those.   There are several major infrastructure categories which reflect common modes of operation for ThingWorx applications. One is Bare Metal, which relies upon the traditional use of hardware to connect directly between operating system and hardware, with no intermediary. Limits of the hardware in this case can be found in manufacturing specifications, within the operating system settings, and listed somewhere within the IT department normally. The IT team is a great resource for obtaining these limits in general, also keeping track of such things in VMware and virtualized infrastructure models.   VMware is an intermediary between the operating system and the hardware, and often its limits are determined based on the sizing of the application and set by the IT team when the infrastructure is established. These can often be resized as needed, and the IT team will be well aware of the limits here, often monitoring some of the performance themselves already. This is especially so if Cloud Providers are used, given that these are scaled up virtualizations which are configured in easy-to-use cloud portals. These two infrastructure models can also be resized as needed.   Lastly Containers can be used to designate operating system resources as needed, in a much more specific way that better supports the sharing of resources across multiple systems. Here the limits are defined in configuration files or charts that define the container.   The difficulties here center around learning what the limits are,  especially in the case of network and disk usage. Network bandwidth can fluctuate, and increased latency and network congestion can occur at random times for seemingly no reason. Most monitoring scenarios can therefore make due with collecting network send and receive rates, as well as disk read and write rates, performed on the server.   Cloud Providers like Azure provide  VM and disk sizing options that allow you to select exactly what you need, but for network throughput or network IO, the choices are not as varied. Network IO tends to increase with the size of the VM, proportional to the number of CPU cores and the amount of Memory, so this may mean that a VM has to be oversized for the user load, for the bulk of the application, in order to accommodate a large or noisy edge fleet. The next few slides list the operating metrics and common thresholds used for each. We often use these thresholds in our own simulations here at PTC, but note that each use case is different, and each situation should be analyzed individually before determining set limits of performance.   Generally, you will want to monitor: % utilization of all CPU cores, leaving plenty of room for spikes in  activity; total and used memory, ensuring total memory remains constant throughout and used memory remains below a reasonable percentage of the total, which for smaller systems (16 GB and lower) means leaving around 20% Memory for the OS, and for larger systems, usually around 3-4 GB.    For disks, the read and write rates to ensure there is ample free space for spikes and to avoid any situation that might result in system down time;  and for networking, the send and receive rates which should be below 70% or so, again to leave room for spikes.   In any monitoring situation, high consistent utilization  should trigger concern and an investigation into  what’s happening. Were new  assets added?  Has any recent change caused regression or other issues?    Any resent changes should be inspected and the infrastructure sizing should be considered as well.  For ThingWorx specific monitoring, we look at max queue sizes, entries performed, pool sizes, alerts, submitted task counts, and anything that might indicate some kind of data loss. We want the queues to be consistently cleared out to reduce the risk of losing data in the case of an interruption, and to ensure there is no reason for resource use to build up and cause issues over time. In order for a monitoring set-up to be truly helpful, it needs to  make certain information easily accessible to administrative users of the application. Any metrics that are applicable to performance needs to be processed and recorded in a location that can be accessed quickly and easily from wherever the admins are. They should quickly and easily know the health of the application from a glance, without needing to drill down a lot to be made aware of issues. Likewise, the alerts that happen should be  meaningful, with minimal false alarms, and it is best if this is configurable by the admins from within the application via some sort of rules engine (see the DGIS guide, soon to be released in version 9.1). The  monitoring tool should also be able to save the system history and export it for further analysis, all in the name of reducing future downtime and creating a stable, enterprise system.     This dashboard (above) is a good example of how to  rollup a number of performance criteria into health indicators for various aspects of the application. Here there is a Green-Yellow-Red color-coding system for issues like web requests taking longer than 30s, 3 minutes, or more to respond.   Grafana is another application used for monitoring internally by our team. The easy dashboard creation feature and built-in chart modes make this tool  super  easy  to get started with, and certainly easy to refer to  from a central location over time. Setting this up is  helpful for load testing and making ready an application, but it is also beneficial for continued monitoring post-go-live, and hence why it is a worthy investment. Our team usually builds a link based on  the start and end time of tests for each simulation  performed, with all of the various servers being  monitored by one Grafana server, one reference point.   Consider using PTC Performance Advisor to help  monitor these kinds of things more easily (also called DynaTrace ). When most administrators think of monitoring, they think of reading and reacting to dashboards, alerts, and reports. Rarely does the idea of benchmarking come to mind as a monitoring activity, and yet, having successful benchmarks of system performance can be a crucial part of knowing if an application is functioning as expected before there are major issues. Benchmarks also look at the response time of the server and can better enable  tracking of actual end user experience. The best  option  is to automate such tests using JMeter or other applications, producing a daily snapshot of user performance that can anticipate future issues and create a more reliable experience for end users over time.   Another tool to make use of is JMeter, which has the option to build custom reports. JMeter is good for simulating the user load, which often makes up most of the server load of a ThingWorx application, especially considering that ingestion is typically optimized independently and given the most thought. The most unexpected issues tend to pop up within the application itself, after the project has gone live.   Shown here (right) is an example benchmark from a Windchill application, one which is published by PTC to facilitate comparison between optimized test  systems and real life performance.  Likewise, DynaTrace is depicted here, showing an automated baseline (using Smart URL  Detection) on Response Time (Median and 90th percentile) as well as Failure Rate. We can also look at Throughput and compare it with the expected value range based on historical throughput data. Monitoring typically increases system performance  and  availability, but its other advantage is to provide faster, more effective troubleshooting. Establish a systematic process or checklist to step through when problems occur, something that is organized to be done quickly, but still takes the time to find and fix the underlying problems. This will prevent issues from happening again and again and polish the system periodically as problems occur, so that the stability and integrity of the system only improves over time. Push for real solutions if possible, not band-aids, even if more downtime is needed up front; it is always better to have planned downtime up front than unplanned downtime down the line. Close any monitoring gaps when issues do occur, which is the valid RCA response if not enough information was captured to actually diagnose or resolve the issue.   PTC Tech Support developed a diagnostic data gathering query for Oracle that customers can use, found in our knowledgebase. This is an example of RCA troubleshooting that looks at different database factors, reporting on which queries perform the worst  based on inputted criteria. Another example of troubleshooting is for the Java JVM, where we look at all of the things listed here (below) in an automated, documented process that then generates a report for easy end user consumption.   Don’t hesitate to reach out to PTC Technical Support in advance to go over your RCA processes, to review benchmark discrepancies between what PTC publishes and what your real-life systems show, and to ensure your monitoring is adequate to maintain system stability and availability at all times.  
View full tip
How to Scale Vertically and Horizontally, and When to Use Sharding Written by Mike Jasperson, VP of IOT EDC   Deployment architecture describes the way in which an IOT application is deployed, or where each of the components are hosted on the network. There are deployment architecture considerations to make when scaling up an application. Each approach to deployment expansion can be described by the “eggs in a basket” analogy: vertical scale is like one person carrying a bigger basket, horizontal scale is like one person carrying more baskets, and sharding is like more than one person carrying the baskets (see below).   All of these approaches result in the eggs getting from point A to point B (they all satisfy the use case), but the simplest (vertical scale) is not necessarily the best. Sure, it makes sense on paper for one person to carry everything in one big basket, but that doesn’t ensure that all of the eggs arrive intact. Selecting the right deployment architecture is a way to ensure the use cases are satisfied in the best and most efficient ways possible, with the least amount of application downtown or data loss.   Vertical Scale - a.k.a. "one person carrying a bigger basket" The most common scalability approach is to simply size the IOT server larger, or scale up the server. This might mean the server is given additional CPU cores, faster CPU clock speeds, more memory, faster disks, additional network bandwidth or improved network cards, and so on. This is a very good idea  when the application logic is increased in complexity, when more data is therefore needed in memory at a time, or when the processing of said data has to occur as quickly as possible. For example, adding additional devices to the fleet increases the size of the “Thing Model” in the process and will require additional heap memory be available to the Foundation server.   However, there are limitations to this approach. Only so many concurrent operations and threads can be performed at once by a single server. Operations trying to read and write to the disks at once can introduce bottlenecks and reduce server performance. Likewise, “one person, one basket” introduces a single-point-of-failure operating risk. If for some reason the server’s performance does degrade or cease altogether, then all of the “eggs” go down with it. Therefore, this approach is important, but usually not sufficient on its own for empowering an enterprise level deployment.   Horizontal Scale - a.k.a. "one person, with two or more baskets" As of ThingWorx version 9.0, Foundation servers can be deployed in clusters, meaning more baskets to carry the eggs. More baskets means that if even one of these servers is active, the application remains up in the event of an individual node failure or maintenance. So clustered deployments are those which facilitate High Availability.   Clustered servers save on some resources, but not others. For instance, every server in a cluster will need to have the same amount of memory, enough to store the entire Thing Model. Each of the multiple baskets in our analogy has to have the same type of eggs. One basket can’t have quail eggs if the rest have chicken eggs. So, each server has to have an identical version of the application, and therefore enough memory to store the entire application.   Also keep in mind that not all application business logic can scale horizontally. Event queues are local to each ThingWorx node, so the events generated within each node are processed locally by that particular node, and not the entire network (examples are timer and scheduler-based activities). Likewise, data ingestion done through an extension or other background process, like MQTT, emits events within a node that therefore must be processed by that particular node, since that's where the events are visible. On the other hand, load distribution that happens external to ThingWorx in either the Connection Server (for AlwaysOn based data coming from ThingWorx SDKs, EMS, eMessage agents, or Kepware) or REST API calls through a load balancer (i.e. user activity) will be distributed across the cluster, facilitating greater scaling potential in terms of userbase and mashup complexity. Also note that batched data will be processed by the node that received it, but different batches coming through a connection server or load balancer will still be distributed.   Another consideration with clusters pertains to failure modes. While each node in the cluster shares a cache for many things, Stream and Value Stream queues are only stored locally. In the event of a node failure, other nodes will pick up subsequent requests, but any activity already queued on the failed node will be lost. For use cases where each and every data point is critical, it is important to size each node large enough (in other words, to vertically scale each node) such that queue sizes are constantly kept low and the data within them processed as quickly as possible. Ensuring sufficient network and database throughput to handle concurrent writes from the many clustered nodes is key as well.   Once each node has enough resources to handle local queues, the system is highly available with low risk for outages or data loss. However, when multiple use cases become necessary on single deployments, horizontal scaling may no longer be enough to ensure things run smoothly. If one use case is logic-heavy, something non-time-critical which processes data for later consumption, it can use too many resources and interfere with other, lighter but more time-critical use cases. Clustering alone does not provide the flexibility to prioritize specific operations or use cases over others, but sharding does.   Sharding - a.k.a. "more than one person carrying baskets" “Sharding” generally refers to breaking up a larger IOT enterprise implementation into smaller ones, each with its own configuration and resources. More server maintenance and administration may be required for each ThingWorx implementation, but the reduction in risk is worth it. If each of the use cases mentioned above has its own implementation, then any unexpected issues with the more complex, analytical logic will not affect the reaction time of operators to time-sensitive matters in the other use case. In other words, “don’t keep all your eggs in one basket”.   The best places to break up an implementation lie along logical boundaries already accepted by the business. Breaking things down in other ways might look nice on paper, but encouraging widespread adoption in those cases tends to be an uphill battle.   In connected products use cases, options for boundaries could be regional, tied more towards business vertical, or centered around different products or models.  These options can be especially beneficial when data needs to stay in particular countries or regions due to regulatory requirements. In connected operations use cases, the most common logical boundaries would be site-based, with smaller IOT implementations serving just a smaller number of related factories in a particular area. Use-case or product-line boundaries can also make sense here, in-line with the above comments about keeping production-critical or time-sensitive use cases isolated from interference from business-support and analysis use cases.     Ideally, a shard model will put the IOT implementation “closer” to both the devices communicating with it and the users that interact with the data. This minimizes the amount of data to be sent or received over long distances, reducing the impact from bandwidth and latency on performance. When determining which approach is best, consider that smaller, more focused implementations offer more flexibility, but are harder to manage. Having different versions of the same applications deployed in multiple places can easily become a maintainability nightmare. It’s therefore best not to combine a regional model with a use case model when it comes to determining sharding boundaries. Also consider using deployment automation tools like Solution Central. These enable tracking and managing version-controlled deployments to multiple IOT implementations, whether they are deployed on-premise, or in the cloud, giving one central location of all source code.   Another benefit to sharding is the focused investment of server resources in a more targeted way. For instance, if one region is larger than another, it may need more CPU and memory. Or, perhaps only part of an application requires High Availability, the time-critical use cases which are best suited to small, clustered deployments. The larger, analysis-centered use cases can then remain non-clustered (but still vertically scaled of course). Sharding can also make access control simpler, as those who need access to only one region or use case can just be given a user account on that particular shard.   However, certain use cases need data from more than one shard in order to operate, turning the data storage and access control benefits into challenges. Luckily, ThingWorx has an excellent toolkit for overcoming such issues. For one thing, REST API calls are readily available in ThingWorx, allowing each shard to exchange information with each other, as well as other enterprise data systems, like ERP or Service Ticketing. Sometimes, lower-level data replication strategies are the way to go, say if downsampling or data transfer from one store to another is necessary, and built-in database tools can more easily handle the workload. Most of the time, however, REST API calls are used to define the business logic within the application layer so that copying data actively between shards is unnecessary, using fewer resources to control what information is shared overall.   There are several design approaches for REST API communication between shards, the two most common being the “peer” model and the “layered” model. In a peer model, one shard may call upon another shard using REST whenever it needs more information. In a layered model, there are “front-line” shards which handle most (if not all) of the device communication and time-critical use cases, things which require only the information in one shard to operate. Then there are also “back-line” shards that aggregate data from the many front-line shards, performing any operations that are less time-sensitive and more complex or analytical.   For any of these approaches, it remains important to keep your data archival and purge strategy in mind. It is a best practice in ThingWorx to only retain as much data as is absolutely necessary, purging the rest periodically. If the front-line shards only ever need the last 7 days of raw data for 5 properties, plus the last 52 weeks of min/max/average data, then implementing an approach where each shard computes the min/max/average values and then archives the older data to a shared “data lake” before purging it would be ideal. This data lake then serves as the data store for all back-line shard operations.   There is also the option to consider sharing some infrastructure between ThingWorx instances when using sharding in a deployment, which can create more flexible, scalable architectures, but can also introduce issues where more than one shard is affected when issues occur on only one shard. For instance, a common shared infrastructure piece is at the database level; each ThingWorx instance needs its own database, but a single database server instance (such as a PostgreSQL HA cluster) could serve separate database namespaces to multiple ThingWorx instances. This is an attractive option where an existing enterprise-scale database infrastructure with experienced DBAs is already in place.   Similarly, load balancers can often be configured to manage load for multiple servers or URLs. If properly configured, an experienced load balancer could direct traffic for multiple applications, but it can also create a bottleneck for inbound connections if not properly sized. Load balancers designed for High Availability can also be considered. Apache Zookeeper is another tool often deployed once for an entire cluster to monitor the health and availability of individual components, or to vote them in or out of operations if problems are detected. With all of these options, remember that sharing infrastructure increases the chances of sharing issues from one ThingWorx system to the next, which can reduce the overall infrastructure complexity at the cost of increased administrative complexity.   Bringing it All Together Vertical and Horizontal scale are both effective ways to add more capacity and availability to software implementations, but there are typically some diminishing returns in the investment of additional infrastructure. For example, consider two large, vertically-scaled implementations – one running on a VM with 64 vCPUs and 256 GiB RAM, and another running on a VM with 96 vCPUs and 384GiB of RAM. While the 96-core server has 1.5 times the compute capacity, in sizing tests with 1 million simulated assets, these two systems tend to fall behind on WebSocket execution at approximately the same point.     In a horizontal scale example, with two nodes each sized the same (64 vCPU and 256GiB RAM), one would expect High Availability to occur, where one node picks up the other’s slack in a failover scenario. However, what if that singular node can’t handle the entire workload? Should both machines be sized vertically such that either can take on the full load, and if so, then what is the cost-benefit of that? It would be less expensive in this case to have a third server.     Optimizing the deployment architecture for a ThingWorx application will therefore usually involve a blended approach. With more than two nodes, High Availability is more readily obtained, as two servers can almost certainly share the load of the third, failed node. Likewise, some workload aspects do not scale well until multiple additional nodes are added. For instance, spreading out the user load from mashup requests across multiple nodes to give the singleton more resources for the tasks it alone can perform doesn’t have much benefit if there are just two nodes.   However, with horizontal scaling alone, the servers may still need to be vertically scaled larger than is ideal in terms of cost. Each one has to hold the entire Thing Model in memory, which means that sometimes, some of the nodes may be oversized for the tasks actually performed there. Sharding allows for each node to have a different Thing Model as necessary based around what boundaries are selected, which can mean saving on costs by sizing each server only as large as it really needs to be.   So, a combination of approaches is typically the best when it comes to deployment architecture. The key is to break things up as much as possible, but in ways that make sense. Determine where the boundaries of the shards will be such that each machine can be as light and focused as possible, while still not introducing more work in terms of user effort (having to access two system to get the job done), application development (extra code used to maintain multiple systems or exchange information between them), and system administration (monitoring and maintaining multiple enterprise systems).   Find the right balance for your systems, and you’ll maximize your cost-benefit ratio and get the most out of your ThingWorx application. Happy developing!
View full tip