cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
Showing results for 
Search instead for 
Did you mean: 
Sort by:
​​​There are four types of Analytics:                                                                 Prescriptive analytics: What should I do about it? Prescriptive analytics is about using data and analytics to improve decisions and therefore the effectiveness of actions.Prescriptive analytics is related to both Descriptive and Predictive analytics. While Descriptive analytics aims to provide insight into what has happened and Predictive analytics helps model and forecast what might happen, Prescriptive analytics seeks to determine the best solution or outcome among various choices, given the known parameters. “Any combination of analytics, math, experiments, simulation, and/or artificial intelligence used to improve the effectiveness of decisions made by humans or by decision logic embedded in applications.”These analytics go beyond descriptive and predictive analytics by recommending one or more possible courses of action. Essentially they predict multiple futures and allow companies to assess a number of possible outcomes based upon their actions. Prescriptive analytics use a combination of techniques and tools such as business rules, algorithms, machine learning and computational modelling procedures. Prescriptive analytics can also suggest decision options for how to take advantage of a future opportunity or mitigate a future risk, and illustrate the implications of each decision option. In practice, prescriptive analytics can continually and automatically process new data to improve the accuracy of predictions and provide better decision options. Prescriptive analytics can be used in two ways: Inform decision logic with analytics: Decision logic needs data as an input to make the decision. The veracity and timeliness of data will insure that the decision logic will operate as expected. It doesn’t matter if the decision logic is that of a person or embedded in an application — in both cases, prescriptive analytics provides the input to the process. Prescriptive analytics can be as simple as aggregate analytics about how much a customer spent on products last month or as sophisticated as a predictive model that predicts the next best offer to a customer. The decision logic may even include an optimization model to determine how much, if any, discount to offer to the customer. Evolve decision logic: Decision logic must evolve to improve or maintain its effectiveness. In some cases, decision logic itself may be flawed or degrade over time. Measuring and analyzing the effectiveness or ineffectiveness of enterprises decisions allows developers to refine or redo decision logic to make it even better. It can be as simple as marketing managers reviewing email conversion rates and adjusting the decision logic to target an additional audience. Alternatively, it can be as sophisticated as embedding a machine learning model in the decision logic for an email marketing campaign to automatically adjust what content is sent to target audiences. Different technologies of Prescriptive analytics to create action: Search and knowledge discovery: Information leads to insights, and insights lead to knowledge. That knowledge enables employees to become smarter about the decisions they make for the benefit of the enterprise. But developers can embed search technology in decision logic to find knowledge used to make decisions in large pools of unstructured big data. Simulation: ​Simulation imitates a real-world process or system over time using a computer model. Because digital simulation relies on a model of the real world, the usefulness and accuracy of simulation to improve decisions depends a lot on the fidelity of the model. Simulation has long been used in multiple industries to test new ideas or how modifications will affect an existing process or system. Mathematical optimization: Mathematical optimization is the process of finding the optimal solution to a problem that has numerically expressed constraints. Machine learning: “Learning” means that the algorithms analyze sets of data to look for patterns and/or correlations that result in insights. Those insights can become deeper and more accurate as the algorithms analyze new data sets. The models created and continuously updated by machine learning can be used as input to decision logic or to improve the decision logic automatically. Paragmetic AI: ​Enterprises can use AI to program machines to continuously learn from new information, build knowledge, and then use that knowledge to make decisions and interact with people and/or other machines.                                               Use of Prescriptive Analytics in ThingWorx Analytics: Thing Optimizer: Thing Optimizer functionality provides the prescriptive scoring and optimization capabilities of ThingWorx Analytics. While predictive scoring allows you to make predictions about future outcomes, prescriptive scoring allows you to see how certain changes might affect future outcomes. After you have generated a prediction model (also called training a model), you can modify the prescriptive attributes in your data (those attributes marked as levers) to alter the predictions. The prescriptive scoring process evaluates each lever attribute, and returns an optimal value for that feature, depending on whether you want to minimize or maximize the goal variable. Prescriptive scoring results include both an original score (the score before any lever attributes are changed) and an optimized score (the score after optimal values are applied to the lever attributes). In addition, for each attribute identified in your data as a lever, original and optimal values are included in the prescriptive scoring results. How to Access Thing Optimizer Functionality: ThingWorx Analytics prescriptive scoring can only be accessed via the REST API Service. Using a REST client, you can access the Scoring service which includes a series of API endpoints to submit scoring requests, retrieve results, list jobs, and more. Requires installation of the ThingWorx Analytics Server. How to avoid mistakes - Below are some common mistakes while doing Prescriptive analytics: Starting digital analytics without a clear goal Ignoring core metrics Choosing overkill analytics tools Creating beautiful reports with little business value Failing to detect tracking errors                                                                                                                                 Image source: Wikipedia, Content: go.forrester.com(Partially)
View full tip
Welcome to the ThingWorx Manufacturing Apps Community! The ThingWorx Manufacturing Apps are easy to deploy, pre-configured role-based starter apps that are built on PTC’s industry-leading IoT platform, ThingWorx. These Apps provide manufacturers with real-time visibility into operational information, improved decision making, accelerated time to value, and unmatched flexibility to drive factory performance.   This Community page is open to all users-- including licensed ThingWorx users, Express (“freemium”) users, or anyone interested in trying the Apps. Tech Support community advocates serve users on this site, and are here to answer your questions about downloading, installing, and configuring the ThingWorx Manufacturing Apps.     A. Sign up: ThingWorx Manufacturing Apps Community: PTC account credentials are needed to participate in the ThingWorx Community. If you have not yet registered a PTC eSupport account, start with the Basic Account Creation page.   Manufacturing Apps Web portal: Register a login for the ThingWorx Manufacturing Apps web portal, where you can download the free trial and navigate to the additional resources discussed below.     B. Download: Choose a download/packaging option to get started.   i. Express/Freemium Installer (best for users who are new to ThingWorx): If you want to quickly install ThingWorx Manufacturing Apps (including ThingWorx) use the following installer: Download the Express/Freemium Installer   ii. 30-day Developer Kit trial: To experience the capabilities of the ThingWorx Platform with the Manufacturing Apps and create your own Apps: Download the 30-day Developer Kit trial   iii. Import as a ThingWorx Extension (for users with a Manufacturing Apps entitlement-- including ThingWorx commercial customers, PTC employees, and PTC Partners): ThingWorx Manufacturing apps can be imported as ThingWorx extensions into an existing ThingWorx Platform install (v8.1.0). To locate the download, open the PTC Software Download Page and expand the following folders:   ThingWorx Platform | Release 8.x | ThingWorx Manufacturing Apps Extension | Most Recent Datacode     C. Learn After downloading the installer or extensions, begin with Installation and Configuration.   Follow the steps laid out in the ThingWorx Manufacturing Apps Setup and Configuration Guide 8.2   Find helpful getting-started guides and videos available within the 'Get Started' section of the ThingWorx Manufacturing Apps Portal.     D. Customize Once you have successfully downloaded, installed, and configured the Manufacturing Apps, begin to explore the deeper potential of the Apps and the ThingWorx Platform.   Follow along with the discussion and steps contained in the ThingWorx Manufacturing Apps and Service Apps Customization Guide  8.2   Also contained within the the 'Get Started' page of the ThingWorx Manufacturing Apps Portal, find the "Evolve and Expand" section, featuring: -Custom Plant Layout application -Custom Asset Advisor application -Global Plant View application -Thingworx Manufacturing Apps Technical Lab with Sigma Tile (Raspberry Pi application) -Configuring the Apps with demo data set and simulator -Additional Advanced Documentation     E. Get help / give feedback / interact Use the ThingWorx Manufacturing Apps Community page as a resource to find documentation, peruse past forum threads, or post a question to start a discussion! For advanced troubleshooting, licensed users are encouraged to submit support tickets to the PTC My eSupport portal.
View full tip
Load Testing through C SDK Remote Device Simulation in ThingWorx   As discussed in the EDC's previous article, load or stress testing a ThingWorx application is very important to the application development process and comes highly recommended by PTC best practices. This article will show how to do stress testing using the ThingWorx C SDK at the Edge side. Attached to this article is a download containing a generic C SDK application and accompanying simulator software written in python. This article will discuss how to unpack everything and move it to the right location on a Linux machine (Ubuntu 16.04 was used in this tutorial and sudo privileges will be necessary). To make this a true test of the Edge software, modify the C SDK code provided or substitute in any custom code used in the Edge devices which connect to the actual application.   It is assumed that ThingWorx is already installed and configured correctly. Anaconda will be downloaded and installed as a part of this tutorial. Note that the simulator only logs at the "error" level on the SDK side, and the data log has been disabled entirely to save resources. For any questions on this tutorial, reach out to the author Desheng Xu from the EDC team (@DeShengXu).   Background: Within ThingWorx, most things represent remote devices located at the Edge. These are pieces of physical equipment which are out in the field and which connect and transmit information to the ThingWorx Platform. Each remote device can have many properties, which can be bound to local properties. In the image below, the example property "Pressure" is bound to the local property "Pressure". The last column indicates whether the property value should be stored in a time series database when the value changes. Only "Pressure" and "TotalFlow" are stored in this way.  A good stress test will have many properties receiving updates simultaneously, so for this test, more properties will be added. An example shown here has 5 integers, 3 numbers, 2 strings, and 1 sin signal property.   Installation: Download Python 3 if it isn't already installed Download Anaconda version 5.2 Sometimes managing multiple Python environments is hard on Linux, especially in Ubuntu and when using an Azure VM. Anaconda is a very convenient way to manage it. Some commands which may help to download Anaconda are provided here, but this is not a comprehensive tutorial for Anaconda installation and configuration. Download Anaconda curl -O https://repo.anaconda.com/archive/Anaconda3-5.2.0-Linux-x86_64.sh  Install Anaconda (this may take 10+ minutes, depending on the hardware and network specifications) bash Anaconda3-5.2.0-Linux-x86_64.sh​ To activate the Anaconda installation, load the new PATH environment variable which was added by the Anaconda installer into the current shell session with the following command: source ~/.bashrc​ Create an environment for stress testing. Let's name this environment as "stress" conda create -n stress python=3.7​ Activate "stress" environment every time you need to use simulator.py source activate stress​  Install the required Python modules Certain modules are needed in the Python environment in order to run the simulator.py  file: psutil, requests. Use the following commands to install these (if using Anaconda as installed above): conda install -n stress -c anaconda psutil conda install -n stress -c anaconda requests​ Unpack the download attached here called csim.zip Unzip csim.zip  and move it into the /opt  folder (if another folder is used, remember to change the page in the simulator.json  file later) Assign your current user full access to this folder (this command assumes the current user is called ubuntu ) : sudo chown -R ubuntu:ubuntu /opt/csim   Move the C SDK source folder to the lib  folder Use the following command:  sudo mv /opt/csim/csdkbuild/libtwCSdk.so.2.2.4 /usr/lib​ You may have to also grant a+x permissions to all files in this folder Update the configuration file for the simulator Open /opt/csim/simulator.json  (or whatever path is used instead) Edit this file to meet your environment needs, based on the information below Familiarize yourself with the simulator.py file and its options Use the following command to get option information: python simulator.py --help​   Set-Up Test Scenario: Plan your test Each simulator instance will have 8 remote properties by default (as shown in the picture in the Background section). More properties can be added for stress test purposes in the simulator.json  file. For the simulator to run 1k writes per second to a time series database, use the following configuration information (note that for this test, a machine with 4 cores and 16G of memory was used. Greater hardware specifications may be required for a larger test): Forget about the default 8 properties, which have random update patterns and result in difficult results to check later. Instead, create "canary properties" for each thing (where canary refers to the nature of a thing to notify others of danger, in the same way canaries were used in mine shafts) Add 25 properties for each thing: 10 integer properties 5 number properties 5 string properties 5 sin properties (signals) Set the scan rate to 5000 ms, making it so that each of these 25 properties will update every 5 seconds. To get a writes per second rate of 1k, we therefore need 200 devices in total, which is specified by the start and end number lines of the configuration file The simulator.json  file should look like this: Canary_Int: 10 Canary_Num: 5 Canary_Str: 5 Canary_Sin: 5 Start_Number: 1 End_Number: 200​ Run the simulator Enter the /opt/csim  folder, and execute the following command: python simulator.py ./simulator.json -i​ You should be able to see a screen like this: Go to ThingWorx to check if there is a dummy thing (under Remote Things in the Monitoring section): This indicates that the simulator is running correctly and connected to ThingWorx Create a Value Stream and point it at the target database Create a new thing and call it "SimulatorDummyThing" Once this is created successfully and saved, a message should pop up to say that the device was successfully connected Bind the remote properties to the new thing Click the "Properties and Alerts" tab Click "Manage Bindings" Click "Add all properties" Click "Done" and then "Save" The properties should begin updating immediately (every 5 seconds), so click "Refresh" to check Create a Thing Template from this thing Click the "More" drop-down and select "Create ThingTemplate" Give the template a name (ensure it matches what is defined in the simulator.json  file) and save it Go back and delete the dummy thing created in Step 4, as now we no longer need it Clean up the simulator Use the following command: python simulator.py ./simulator.json -k​ Output will look like this: Create 200 things in ThingWorx for the stress test Verify the information in the simulator.json  file (especially the start and end numbers) is correct Use the following command to create all things: python simulator.py ./simulator.json -c​ The output will look like this: Verify the things have also been created in ThingWorx: Now you are ready for the stress test   Run Stress Test: Use the following command to start your test: python simulator.py ./simulator.json -l​ or python simulator.py ./simulator.json --launch The output in the simulator will look like this: Monitor the Value Stream writing status in the Monitoring section of ThingWorx:   Stop and Clean Up: Use the following command to stop running all instances: python simulator.py ./simulator -k​ If you want to clean up all created dummy things, then use this command: python simulator.py ./simulator -d​ To re-initiate the test at a later date, just repeat the steps in the "Run Stress Test" section above, or re-configure the test by reviewing the steps in the "Set-Up Test Scenario" section   That concludes the tutorial on how to use the C SDK in a stress or load test of a ThingWorx application. Be sure to modify the created Thing Template (created in step 6 of the "Set-Up Test Scenario" section) with any business logic required, for instance events and alerts, to ensure a proper test of the application. 
View full tip
This document attached to this blog entry actually came out of my first exposure to using the C SDK on a Raspberry PI. I took notes on what I had to do to get my own simple edge application working and I think it is a good introduction to using the C SDK to report real, sampled data. It also demonstrates how you can use the C SDK without having to use HTTPS. It demonstrates how to turn off HTTPS support. I would appreciate any feedback on this document and what additions might be useful to anyone else who tries to do this on their own.
View full tip
Original Post Date:      June 6, 2016   Description: This is a video tutorial on creating an InfoTable through a service, creating and adding a Data Shape adding rows to the Infotable through a service, adding rows to an InfoTable property and querying the InfoTable.    
View full tip
Timers and schedulers can be useful tool in a Thingworx application.  Their only purpose, of course, is to create events that can be used by the platform to perform and number of tasks.  These can range from, requesting data from an edge device, to doing calculations for alerts, to running archive functions for data.  Sounds like a simple enough process.  Then why do most platform performance issues seem to come from these two simple templates? It all has to do with how the event is subscribed to and how the platform needs to process events and subscriptions.  The tasks of handling MOST events and their related subscription logic is in the EventProcessingSubsystem.  You can see the metrics of this via the Monitoring -> Subsystems menu in Composer.  This will show you how many events have been processed and how many events are waiting in queue to be processed, along with some other settings.  You can often identify issues with Timers and Schedulers here, you will see the number of queued events climb and the number of processed events stagnate. But why!?  Shouldn't this multi-threaded processing take care of all of that.  Most times it can easily do this but when you suddenly flood it with transaction all trying to access the same resources and the same time it can grind to a halt. This typically occurs when you create a timer/scheduler and subscribe to it's event at a template level.  To illustrate this lets look at an example of what might occur.  In this scenario let's imagine we have 1,000 edge devices that we must pull data from.  We only need to get this information every 5 minutes.  When we retrieve it we must lookup some data mapping from a DataTable and store the data in a Stream.  At the 5 minute interval the timer fires it's event.  Suddenly all at once the EventProcessingSubsystem get 1000 events.  This by itself is not a problem, but it will concurrently try to process as many as it can to be efficient.  So we now have multiple transactions all trying to query a single DataTable all at once.  In order to read this table the database (no matter which back end persistence provider) will lock parts or all of the table (depending on the query).  As you can probably guess things begin to slow down because each transaction has the lock while many others are trying to acquire one.  This happens over and over until all 1,000 transactions are complete.  In the mean time we are also doing other commands in the subscription and writing Stream entries to the same database inside the same transactions.  Additionally remember all of these transactions and data they access must be held in memory while they are running.  You also will see a memory spike and depending on resource can run into a problem here as well. Regular events can easily be part of any use case, so how would that work!  The trick to know here comes in two parts.  First, any event a Thing raises can be subscribed to on that same Thing.  When you do this the subscription transaction does not go into the EventProcessingSubsystem.  It will execute on the threads already open in memory for that Thing.  So subscribing to a timer event on the Timer Thing that raised the event will not flood the subsystem. In the previous example, how would you go about polling all of these Things.  Simple, you take the exact logic you would have executed on the template subscription and move it to the timer subscription.  To keep the context of the Thing, use the GetImplimentingThings service for the template to retrieve the list of all 1,000 Things created based on it.  Then loop through these things and execute the logic.  This also means that all of the DataTable queries and logic will be executed sequentially so the database locking issue goes away as well.  Memory issues decrease also because the allocated memory for the quries is either reused or can be clean during garbage collection since the use of the variable that held the result is reallocated on each loop. Overall it is best not to use Timers and Schedulers whenever possible.  Use data triggered events, UI interactions or Rest API calls to initiate transactions whenever possible.  It lowers the overall risk of flooding the system with recourse demands, from processor, to memory, to threads, to database.  Sometimes, though, they are needed.  Follow the basic guides in logic here and things should run smoothly!
View full tip
In this post, I will use an instance of InfluxDB and Chronograf. See this post for installing both using Docker. InfluxDB - Time Series Databases   InfluxDB is a time series database. It allows users to work with and organize time series data. The advantage of such a database system is that it comes with built-in functionality to easily aggregate and operate on data based on time intervals. Other types of databases can do this as well - but time series databases are heavily optimized for this kind of data structures which will show in storage space and performance.   Data is stored in the database with its timestamp, its value and one or more tags.   Time Temperature Humidity Location 2019-01-24T00:00:00 23 42 Home 2019-01-24T00:01:00 22 43 Home 2019-01-24T00:02:00 21 44 Home 2019-01-24T00:03:00 23 45 Home 2019-01-24T00:04:00 24 42 Home 2019-01-24T00:05:00 25 43 Home 2019-01-24T00:06:00 23 44 Home   Values can be aggregated by intervalls, i.e. "give me the temperatur values within the last hour and take the average for 5 minutes". This would result in (60 / 5) = 12 results with a value that represents the average temperature within this 5 minute interval.   Example: Temperature Data averaged by 4 minutes   Time Temperature 2019-01-24T00:00:00 (23 + 22 + 21+ 23) / 4 = 22,25 2019-01-24T00:04:00 (24 + 25 + 23) / 3 = 24   To find out more about InfluxDB see also https://www.influxdata.com/time-series-database/ and https://www.influxdata.com/time-series-platform/   InfluxDB in ThingWorx   The new ThingWorx 8.4 release comes with an option to setup InfluxDB as additional Persistence Provider. Meta Data like Entity Definitons will still be stored in PostgreSQL. Streams, Value Streams and Data Tables however can be stored in InfluxDB.   The InfluxDB Persistence Provider setup is delivered with the PostgreSQL installation package for ThingWorx. Currently ThingWorx does not allow any aggregation of data with its built-in InfluxDB capabilities.   Prepare InfluxDB   InfluxDB will need a user and a database. Connect via Chronograf - the graphical UI to administer InfluxDB and create a new user via   InfluxDB Admin > Users Default username = twadmin Default password = password Permissions = ALL   Create a new database via   InfluxDB Admin > Databases Default database name = thingworx   Configure ThingWorx   Create a new Persistence Provider for InfluxDB in ThingWorx - but don't mark it as active yet!     Switch to the Configuration and change the username / password, database and hostname to match your installation.     Save the configuration, switch back to the General tab and mark the InfluxDB Persistence Provider as Active.   Save again and a "successful" message will be shown. If the save action failed, the connection settings are not correct - check for the correct ports and for any typos.   Creating Entities & Testing   Streams, Value Streams and Data Tables can now be created using the new InfluxDB Persistence Provider.   To test with a Value Stream   Create a new Thing with some NUMBER properties, e.g. 'a', 'b' and 'c' as properties - ensure they are marked as logged as well Name = InfluxValueStreamThing Create a new ValueStream based and change its Persistance Provider to the InfluxDB created above Name = InfluxValueStream Save both Entities Setting values for the properties will now automatically create the entries in InfluxDB - including the Entity name "InfluxValueStreamThing" Running the QueryPropertyHistory service on the Thing will return the results as an InfoTable In Chronograf this will display like this:   ThingWorx 8.4 will be released end of January 2019. Be sure to check out and test the new Persistence Provider features!
View full tip
This document is designed to help troubleshoot some commonly seen issues while installing or upgrading the ThingWorx application, prior or instead of contacting Tech Support. This is not a defined template for a guaranteed solution, but rather a reference guide that provides an opportunity to eliminate some of the possible root causes. While following the installation guide and matching the system requirements is sufficient to get a successfully running instance of ThingWorx, some issues could still occur upon launching the app for the first time. Generally, those issues arise from minor environmental details and can be easily fixed by aligning with the proper installation process. Currently, the majority of the installation hiccups are coming from the postgresql side. That being said, the very first thing to note, whether it's a new user trying out the platform or a returning one switching the database to postgresql, note that: Postgresql database must be installed, configured, and running prior to the further Thingworx installation. ThingWorx 7.0+: Installation errors out with 'failed to succeed more than the maximum number of allowed acquisition attempts' Platform being shut down because System Ownership cannot be acquired error ERROR: relation "system_version" does not exist Resolution: Generally, this type of error point at the security/permission issue. As all of the installation operations should be performed by a root/Administrator role, the following points should be verified: Ensure both Tomcat and ThingworxPlatform folders have relevant read/write permissions The title and contents of the configuration file in the ThingworxPlatform folder has changed from 6.x to 7.x Check if the right configuration file is in the folder Verify if the name and password provided in this configuration file matches the ones set in the Postgres DB Run the Database cleanup script, and then set up the database again. Verufy by checking the thingworx table space (about 53 tables should be created)     Thingworx Application: Blank screen, no errors in the logs, "waiting for <url> " gears running be never actually loading, eventually times out     Resolution: Ensure that Java in tomcat is pointing to the right path, should be something like this: C:\Program Files\Java\jre1.8.0_101\bin\server\jvm.dll 6.5+ Postgres:   Error when executing thingworxpostgresDBSetup.bat psql:./thingworx-database-setup.sql:1: ERROR: could not set permissions on directory "D:/ThingworxPostgresqlStorage": Permission denied     Resolution:     The error means that the postgres user was not able to create a directory in the ‘ThingworxPostgresStorage’ directory. As it's related to the security/permission, the following steps can be taken to clear out the error: Assigning read/write permissions to everyone user group to fix the script execution and then execute the batch file: Right-click on ‘ThingworxPostgresStorage’ directory -> Share with -> specific people. Select drop-down, add everyone group and update the permission level to Read/Write. Click Share. Executing the batch file as admin. 2. Installation error message "relation root_entity_collection does not exist" is displayed with Postgresql version of the ThingWorx platform. Resolution:     Such an error message is displayed only if the schema parameter passed to thingworxPostgresSchemaSetup.sh script  is different than $USER or PUBLIC. To clear out the error: Edit the Postgresql configuration file, postgresql.conf, to add to the SEARCH_PATH item your own schema . Other common errors upon launching the application. Two of the most commonly seen errors are 404 and 401.  While there can be a numerous reasons to see those errors, here are the root causes that fall under the "very likely" category: 404 Application not found during a new install: Ensure Thingworx.war was deployed -- check the hard drive directory of Tomcat/webapps and ensure Thingworx.war and Thingworx folder are present as well as the ThingworxStorage in the root (or custom selected location) Ensure the Thingworx.war is not corrupted (may re-download from the support and compare the size) 401 Application cannot be accessed during a new install or upgrade: For Postgresql, ensure the database is running and is connected to, also see the Basic Troubleshooting points below. Verify the tomcat, java, and database (in case of postgresql) versions are matching the system requirement guide for the appropriate platform version Ensure the updrade was performed according to the guide and the necessary folders were removed (after copying as a preventative measure). Ensure the correct port is specified in platform-settings.json (for Postgresql), by default the connection string is jdbc:postgresql://localhost: 5432 /thingworx Again, it should be kept in mind that while the symptoms are common and can generally be resolved with the same solution, every system environment is unique and may require an individual approach in a guaranteed resolution. Basic troubleshooting points for: Validating PostgreSQL installation Postgres install troubleshooting java.lang.NullPointerException error during PostgreSQL installation ***CRITICAL ERROR: permission denied for relation root_entity_collection Error while running scripts: Could not set permissions on directory "/ThingworxPostgresqlStorage":Permission Denied Acquisition Attempt Failed error Resolution : Ensure 'ThingworxStorage', 'ThingworxPlatform' and 'ThingworxPostgresqlStorage' folders are created The folders have to be present in the root directory unless specifically changed in any configurations Recommended to grant sufficient privileges (if not all) to the database user (twadmin) Note: While running the script in order to create a database, if a schema name other than 'public' is used, the "search_path" in "postgresql.conf" must be changed to reflect 'NewSchemaName, public' Grant permission to user for access to root folders containing 'ThingworxPostgresqlStorage' and 'ThingworxPlatform' The password set for the default 'twadmin' in the pgAdmin III tool must match the password set in the configuration file under the ThingworxPlatform folder Ensure THINGWORX_PLATFORM_SETTINGS variable is set up Error: psql:./thingworx-database-setup.sql:14: ERROR:  could not create directory "pg_tblspc/16419/PG_9.4_201409291/16420": No such file or directory psql:./thingworx-database-setup.sql:16: ERROR:  database "thingworx" does not exist Resolution: R eplacing /ThingworxPostgresqlStorage in the .bat file by C:\ThingworxPostgresqlStorage and omitting the -l option in the command window. Also, note the following error Troubleshooting Syntax Error when running postgresql set up scripts
View full tip
Applicable Releases: ThingWorx Platform 7.0 to 8.5   Description:   Introduction to ThingWorx Extension Development, with the following topics: What is an Extension Why building an Extension Prerequisites Installing Eclipse plugin and features Creating entities with the plugin and including exported Entities in an Extension Project Upgrading or Updating and Existing extension in ThingWorx Building with Gradle and Ant       ThingWorx Extension Development Guide
View full tip
Large files could cause slow response times. In some cases large queries might cause extensively large response files, e.g. calling a ThingWorx service that returns an extensively large result set as JSON file.   Those massive files have to be transferred over the network and require additional bandwidth - for each and every call. The more bandwidth is used, the more time is taken on the network, the more the impact on performance could be. Imagine transferring tens or hundreds of MB for service calls for each and every call - over and over again.   To reduce the bandwidth compression can be activated. Instead of transferring MBs per service call, the server only has to transfer a couple of KB per call (best case scenario). This needs to be configured on Tomcat level. There is some information availabe in the offical Tomcat documation at https://tomcat.apache.org/tomcat-8.5-doc/config/http.html Search for the "compression" attribute.   Gzip compression   Usually Tomcat is compressing content in gzip. To verify if a certain response is in fact compressed or not, the Development Tools or Fiddler can be used. The Response Headers usually mention the compression type if the content is compressed:     Left: no compression Right: compression on Tomcat level   Not so straight forward - network vs. compression time trade-off   There's however a pitfall with compression on Tomcat side. Each response will add additional strain on time and resources (like CPU) to compress on the server and decompress the content on the client. Especially for small files this might be an unnecessary overhead as the time and resources to compress might take longer than just transferring a couple of uncompressed KB.   In the end it's a trade-off between network speed and the speed of compressing, decompressing response files on server and client. With the compressionMinSize attribute a compromise size can be set to find the best balance between compression and bandwith.   This trade-off can be clearly seen (for small content) here:     While the Size of the content shrinks, the Time increases. For larger content files however the Time will slightly increase as well due to the compression overhead, whereas the Size can be potentially dropped by a massive factor - especially for text based files.   Above test has been performed on a local virtual machine which basically neglegts most of the network related traffic problems resulting in performance issues - therefore the overhead in Time are a couple of milliseconds for the compression / decompression.   The default for the compressionMinSize is 2048 byte.   High potential performance improvement   Looking at the Combined.js the content size can be reduced significantly from 4.3 MB to only 886 KB. For my simple Mashup showing a chart with Temperature and Humidity this also decreases total load time from 32 to 2 seconds - also decreasing the content size from 6.1 MB to 1.2 MB!     This decreases load time and size by a factor of 16x and 5x - the total time until finished rendering the page has been decreased by a factor of almost 22x! (for this particular use case)   Configuration   To configure compression, open Tomcat's server.xml   In the <Connector> definitions add the following:   compression="on" compressibleMimeType="text/html,text/xml,text/plain,text/css,text/javascript,application/javascript,application/json"     This will use the default compressionMinSize of 2048 bytes. In addition to the default Mime Types I've also added application/json to compress ThingWorx service call results.   This needs to be configured for all Connectors that users should access - e.g. for HTTP and HTTPS connectors. For testing purposes I have a HTTPS connector with compression while HTTP is running without it.   Conclusion   If possible, enable compression to speed up content download for the client.   However there are some scenarios where compression is actually not a good idea - e.g. when using a WAN Accelerator or other network components that usually bring their own content compression. This not only adds unnecessary overhead but is compressing twice which might lead to errors on client side when decompressing the content.   Especially dealing with large responses can help decreasing impact on performance. As compressing and decompressing adds some overhead, the min size limit can be experimented with to find the optimal compromise between a network and compression time trade-off.
View full tip
Create a new Thing using the Scheduler Thing Template. The Scheduler Thing will fire a ScheduledEvent Event when the configured schedule is fired. The event is automatically present and does not need to be added manually. Configuration   The Scheduler Configuration is quite straightforward and allows for an exact setup of schedule based on units of time, e.g. seconds, minutes, hours, days of week etc. It can be accessed via the Thing's Entity Configuration   Configuration allows for Changing the runAsUser context - in which the Events will be handled. The user will need visibility and permission on e.g. executing Services or depending Things, which are required to run the Service triggered by the Event. Changing the Schedule - in which time the Events will be fired (by default every minute). The schedule is displayed in CRON String notation and can be changed and viewed in detail by clicking on "More". The CRON String will be generated automatically based on the inputs. Schedules can be configured in Manual mode - allowing for full configuration of each and every time based attribute. Schedules can be configured for a specific time Type - allowing for configuration only based on seconds, minutes, hours, days, weeks, months or years. Below screenshots show schedules running every minute and every Saturday / Sunday at 12:00 ("Every Weekend Day").     Services   Scheduler Things inherit two Services by default from the Thing Template DisableScheduler EnableScheduler These will activate / de-activate the Scheduler and allow / disallow firing Events once a scheduled time is reached If a Scheduler is currenty enabled or disabled can be seen in its properites  
View full tip
Since it's somewhat unclear on how to set up the reset password feature through the login form, these steps might be a little more helpful. Assuming the mail extension has already been imported into the Thingworx platform and properly configured - say, PassReset - (test with SendMessage service to verify), let's go ahead and create a new user - Blank, and a new organization that will have that user assigned as a member - Test. Let's open the configuration tab for the organization, assign the PassReset mail thing as the mail server, assign login image, style, prompt (optional), check the Allow Password Reset, then the rest looks like this: Onto the Email content part, it is not possible to save the organization as is at the moment: Clicking on the question mark for the Email content will provide the following requirements: Now this is when it might not be too clear. The tokens [[:user:]], [[:organization:]], [[:url:]] can be used in the email body and at the runtime will be replaced with the actual Usernames, organization, and the reset password url. Out of those fiels, only [[:url:]] token is required. So, it is sufficient to place only [[:url:]] in the body and save the organization: Then, when going to the FormLogin, at <your thingworx host:port>/Thingworx/FormLogin/<organization name>, a password reset button is available: Filling out the User information in the reset field, the email gets sent to the user address specified and the proper message appears: Since in this example only the [[:url:]]  token has been used in the email content, the email received will look like this: To troubleshoot any errors that might be seen in the process of retrieving the password reset link, it's helpful to check your browser developer tools and Thingworx application log for details.
View full tip
There are Four Types of Analytics:                         Descriptive: What Happened? Descriptive analytics is a preliminary stage of data processing that creates a summary of historical data to yield useful information and possibly prepare the data for further analysis. Analytics, which use data aggregation and data mining to provide insight into the past and answer: “What has happened? Descriptive analysis or statistics does exactly what the name implies they “Describe”, or summarize raw data and make it something that is interpret-able by humans. They are analytics that describe the past. The past refers to any point of time that an event has occurred, whether it is one minute ago, or one year ago. Descriptive analytics are useful because they allow us to learn from past behaviors, and understand how they might influence future outcomes. The vast majority of the statistics we use fall into this category. (Think basic arithmetic like sums, averages, percent changes). Usually, the underlying data is a count, or aggregate of a filtered column of data to which basic math is applied. For all practical purposes, there are an infinite number of these statistics. Descriptive statistics are useful to show things like, total stock in inventory, average dollars spent per customer and Year over year change in sales. Common examples of descriptive analytics are reports that provide historical insights regarding the company’s production, financials, operations, sales, finance, inventory and customers. Note: Use Descriptive Analytics when you need to understand at an aggregate level what is going on in your company, and when you want to summarize and describe different aspects of your business.                                     Different techniques of Descriptive Analytics: Sampling Mean Mode Median Standard Deviation Range and Variance Stem and Leaf Diagram Histogram Quartiles Frequency Distributions Use of Descriptive Analytics in ThingWorx Analytics: Signal Detection: When analyzing volumes of data, it is helpful to know which data is actually useful and which data is just noise. Signals are based on a correlation algorithm that examines historical data to identify the strength of a given input in predicting future outcomes. Signals can identify meaningful correlations within the data. Signals are useful during initial analysis to determine which features you want to curate in a given data-set for predictive model generation. For example, knowing the month of the year is more important to accurately predicting tomorrow’s weather than knowing the day of the week. The month has a much stronger signal than the day of the week for this prediction. ThingWorx Analytics reports signal strength in a mutual information (MI) score that represents the probability of predicting the goal variable when a given feature is provided. It can effectively capture non-linear relationships. ThingWorx Analytics evaluates each feature, or combination of features, to identify the top signals. Cluster Analysis: Cluster analysis categorizes data into groups based on similarities relative to a goal variable. Like a clique, objects in a cluster minimize intra-distances (distances within the cluster) while maximizing inter-distances (distances between clusters). Clusters are mutually exclusive, meaning that each record can belong to only one cluster. However, ThingWorx Analytics supports a user-defined cluster hierarchy that can include sub-clusters inside other clusters. The higher the number of clusters in the data, the smaller each cluster’s population will be, but the stronger the potential insights can be. How to Access Descriptive Analysis Functionality via ThingWorx Analytics: REST API Service — Using a REST client, you can access the Signals Service and the Clusters Service. Each service includes a series of API endpoints to submit analysis requests, retrieve results, list jobs, and more. Requires installation of the ThingWorx Analytics Server. Analytics Builder — As part of the ThingWorx Analytics Extension, Analytics Builder provides a user interface for interacting with your data. In addition to generating and scoring predictive models in Analytics Builder, you can also run procedures to generate signals. How to avoid mistakes - Useful tips for Different Techniques of Descriptive Analytics: Crystallize the research problem → Operability of it! Read literature on data analysis techniques. Evaluate various techniques that can do similar things w.r.t. to research problem. Know what a technique does and what it doesn’t. Consult people, esp. supervisor.
View full tip
1. Add an Json parameter Example: { ​    "rows":[         {             "email":" example1@ptc.com "         },         {             "name":"Qaqa",             "email":" example2@ptc.com "         }     ] } 2. Create an Infotable with a DataShape using CreateInfoTableFromDataShape(params) 3. Using a for loop, iterate through each Json object and add it to the Infotable using InfoTableName.AddRow(YourRowObjectHere) Example: var params = {     infoTableName: "InfoTable",     dataShapeName : "jsontest" }; var infotabletest = Resources["InfoTableFunctions"].CreateInfoTableFromDataShape(params); for(var i=0; i<json.rows.length; i++) {     infotabletest.AddRow({name:json.rows.name,email:json.rows.email}); }
View full tip
Putting this out because this is a difficult problem to troubleshoot if you don't do it right. Let's say you have an application where you have visibility permissions in effect. So you have Users group removed from the Everyone Organization Now you have a Thing "Thing1" with Properties that are being logged to a ValueStream "VS1" What do you need to make this work? Obviously the necessary permissions to Write the values to the Thing1 and read the values from Thing1 (for UI) But for visibility what you'll need is: Visibility to Thing1 (makes sense) Visibility to the Persistence Provider of the ValueStream VS1 !!!! Nope you don't need Visibility to the ValueStream itself, but you DO need Visibility to the Persistence Provider of that ValueStream The way the lack of this permission was showing in the Application Log was a message about trying to provide a Null value.
View full tip
Exciting news! ThingWorx now has improved support for Docker containers to help you manage CI/CD, improve development efficiency in your organization and save costs. Check out these FAQs below and, as always, reach out to me if you have any additional questions.   Stay connected, Kaya   FAQs: ThingWorx Docker Containers   What are Docker Containers? From Docker.com: “a Docker container image is a lightweight, standalone, executable package of software that includes everything needed to run an application: code, runtime, system tools, system libraries and settings”. Learn more here.   What's the difference between Docker containers and VMs? Containers are an abstraction at the app layer that packages code and dependencies together, whereas Virtual Machines (VMs) are an abstraction of physical hardware turning one server into many servers. Here are some great discussions on it on Stack Overflow. Containers vs. VMs   How can I build ThingWorx Docker images? Check out the Building ThingWorx 8.3 Docker Images Guide or watch this video to instruct you on how to build and test Docker containers. (view in My Videos)   How does PTC support building ThingWorx Docker images? PTC provides the ability for customers and partners to build ThingWorx Docker images. A customer can download the Dockerfiles and scripts packaged as a zip folder from the PTC Software Downloads Portal under “ThingWorx Platform,” then “Release 8.3”  then“ThingWorx Dockerfiles.” (Please note that you must be logged in for the link to function properly.) PTC Software Downloads Portal The zip folder contains the Dockerfiles, template jar, and scripts to fetch Tomcat, and ThingWorx WAR files using CLI. Java must be downloaded manually from the vendor's website. We also provide an instructional guide called “Build ThingWorx Docker Images” available on the Reference Documents page on the Support Portal.   How are ThingWorx Docker images different from the usual delivery media of WAR files? The WAR file delivery is typically accompanied by an installation guide that contains the manual steps for creating the VM or bare-metal environment. That guide includes instructions for the administrator to manually install the prerequisites, including Tomcat, Java, and ThingWorx platform settings files. To deploy and run the WAR file, the administrator follows the guide to create the runtime environment on an OS. In contrast, the Dockerfile build in this delivery automates the creation of a Docker image once supplied with the prerequisites.   Do you have any reference deployment and guidance? Yes, you can refer to our blog post to learn how to deploy and run ThingWorx Docker containers on your existing Kubernetes environment.   Is there any recommendation on which Container Orchestrator as a Service (CaaS) a customer should run ThingWorx Foundation Docker container images on? You can use Docker-Compose for testing, but it is generally not suggested for production deployment use cases. In a production environment, customers should use container orchestrators such as Kubernetes, OpenShift, Azure Kubernetes Service (AKS), or Amazon Elastic Container Service for Kubernetes (Amazon EKS), to deploy and manage ThingWorx Docker images.   What are the skill sets required? Familiarity with OS CLI and Docker tools is required to build building the ThingWorx Docker images. Familiarity with Docker-compose to run the resulting Docker containers is needed to test the resulting builds. We don’t recommend Docker-Compose for production use, but when using it for local testing and demo purposes, users can rapidly install ThingWorx and get it up and running in minutes. We expect PTC partners and customers who want to run ThingWorx containerized instances in their production environment to possess the required skill sets within their DevOps team.   How is ThingWorx licensing handled with the Docker images? By default, the container created from these Docker images starts up in a limited mode with no license supplied. You can configure your username and password for the PTC licensing portal to automatically load a license via environment variables passed into the container on startup. Additionally, you can mount a volume to the /ThingworxPlatform directory, which contains your license file, or to retrieve a license request. To keep your Host ID consistent, ensure that the /ThingworxStorage and /ThingworxPlatform directories are persisted and not removed with individual container restarts. More detailed instructions can be found in the build guide or in a Kubernetes blog post .   Is Docker free? What version of Docker does PTC support for ThingWorx? Docker is open-source and licensed under the Apache 2 license. Information on Docker licensing can be found here. The following Docker versions are required: Docker Community Edition (docker-ce) Version 18.05.0-ce is recommended. To install the Docker Community Edition on your system, follow the instructions for your operating system on the Docker website here. Docker Compose (docker-compose) Version 1.17.1 is recommended. To install the Docker Compose on your system, follow the instructions for your operating system on the Docker website here. What persistence providers are currently supported? PTC provides the ability to build ThingWorx Foundation containers for the following supported persistence providers: H2 Microsoft SQL Server PostgreSQL Additional persistence providers will be added to the Docker build delivery as the ThingWorx Foundation Platform releases support for those new databases in future releases.   What are some of the security best practices? For production use, customers are strongly advised to secure their Docker environments by following all the recommendations provided by Docker. Review and implement the best practices detailed at https://docs.docker.com/engine/security/security/.   Can we build Docker images for ThingWorx High Availability (HA) architecture? Yes. ThingWorx Dockerfiles are provided for both basic ThingWorx deployment architecture and HA ThingWorx deployment architecture.   How easy is the rehosting and upgrading of ThingWorx releases on Docker with existing data? In Kubernetes environment, data is kept in a separate volume and can be attached to different containers. When one container dies, the data can be attached to a different container and the container should start without issue. For more information, please refer to the upgrade section of the Building ThingWorx 8.3 Docker Images Guide.   Is it okay to use the Docker exec and access the bash shell to make config changes or should I always rebuild the image and re-deploy?­ Although using Docker exec to gain access to the container internals is useful for testing and troubleshooting issues, any changes made will not be saved after a container is stopped. To configure a container's environment, variables are passed in during the start process. This can be done with Docker start commands, using compose files with environment variables defined, or with helm charts. More detailed instructions can be found in the build guide or in this blog post .   What if there are issues? Should I call PTC Technical Support? We are providing the scripts and reference documents solely to empower our community to build ThingWorx Docker images. We believe that customers using Docker in their production processes would have expertise to manage running Docker containers themselves. If there are any issues or questions regarding the build scripts provided in the PTC official downloads portal, then customers can contact PTC Technical Support at 1-800-477-6435 or visit us online at: http://support.ptc.com. PTC does not provide support for orchestration troubleshooting.   What can you share about future roadmap plans? As we are enabling our customers and partners to build ThingWorx Foundation Platform Docker images, we plan to do the same for upcoming products such as ThingWorx Integration & Orchestration, ThingWorx Analytics, upcoming persistence providers such as InfluxDB, and many more. We also plan to provide additional reference architecture examples and use cases to help developers understand how to use Docker containers in their DevOps and production environments.   Where can I learn more about Docker containers and container orchestrators? See these resources below for additional information: https://training.docker.com/ https://kubernetes.io/docs/tutorials/online-training/overview/
View full tip
Hi Community,   I've recently had a number of questions from colleagues around architectures involving MQTT and what our preferred approach was.  After some internal verification, I wanted to share an aggregate of my findings with the ThingWorx Architect and Developer Community.   PTC currently supports four methods for integrating with MQTT for IoT projects. ThingWorx Azure IoT Hub Connector ThingWorx MQTT Extension ThingWorx Kepware Server Choice is nice, but it adds complexity and sometimes confusion.  The intent of this article is to clarify and provide direction on the subject to help others choose the path best suited for their situation.   ThingWorx MQTT Extension The ThingWorx MQTT extension has been available on the marketplace as an unsupported “PTC Labs” extension for a number of years.  Recently its status has been upgraded to “PTC Supported” and it has received some attention from R&D getting some bug fixes and security enhancements.  Most people who have used MQTT with ThingWorx are familiar with this extension.  As with anything, it has advantages and disadvantages.  You can easily import the extension without having administrative access to the machine, it’s easy to move around and store with projects, and can be up and running quite quickly.  However it is also quite limited when it comes to the flexibility required when building a production application, is tied directly to the core platform, and does not get feature/functionality updates.   The MQTT extension is a good choice for PoCs, demos, benchmarks, and prototypes as it provides MQTT integration relatively quickly and easily.  As an extension which runs with the core platform, it is not a good choice as a part of a client/enterprise application where MQTT communication reliability is critical.   ThingWorx Azure IoT Hub Connector Although Azure IoT Hub is not a fully functional MQTT broker, Azure IoT does support MQTT endpoints on both IoT Hub and IoT Edge.  This can be an interesting option to have MQTT devices publish to Azure IoT and be integrated to ThingWorx using the Azure IoT Hub Connector without actually requiring an MQTT broker to run and be maintained.  The Azure IoT Hub Connector works similarly to the PAT and is built on the Connection Server, but adds the notion of device management and security provided by Azure IoT.   When using Azure IoT Edge configured as a transparent gateway with buffering (store and forward) enabled, this approach has the added benefit of being able to buffer MQTT device messages at a remote site with the ability to handle Internet interruptions without losing data.   This approach has the added benefit of having far greater integrated security capabilities by leveraging certificates and tying into Azure KeyVault, as well as easily scaling up resources receiving the MQTT messages (IoT Hub and Azure IoT Hub Connector).  Considering that this approach is build on the Connection Server core, it also follows our deployment guidance for processing communications outside of the core platform (unlike the extension approach).   ThingWorx Kepware Server As some will note, KepWare has some pretty awesome MQTT capabilities: both as north and southbound interfaces.  The MQTT Client driver allows creating an MQTT channel to devices communicating via MQTT with auto-tag creation (from the MQTT payload).  Coupled with the native ThingWorx AlwaysOn connection, you can easily connect KepWare to an on-premise MQTT broker and connect these devices to ThingWorx over AlwaysOn.   The IoT Gateway plug-in has an MQTT agent which allows publishing data from all of your KepWare connected devices to an MQTT broker or endpoint.  The MQTT agent can also receive tag updates on a different topic and write back to the controllers.  We’ve used this MQTT agent to connect industrial control system data to ThingWorx through cloud platforms like Azure IoT, AWS, and communications providers.   ThingWorx Product Segment Direction A key factor in deciding how to design your solution should be aligned with our product development direction.  The ThingWorx Product Management and R&D teams have for years been putting their focus on scalable and enterprise-ready approaches that our partners and customers can build upon.  I mention this to make it clear that not all supported approaches carry the same weight.  Although we do support the MQTT extension, it is not in active development due to the fact that out-of-platform microservices-based communication interfaces are our direction forward.   The Azure IoT Hub Connector, being built on the Connection Server is currently the way forward for MQTT communications to the ThingWorx Foundation.   Regards,   Greg Eva
View full tip
Timers and Schedulers can also be created and configured programmatically via custom services. The following service, which can be created on any Thing, will create a new Timer using the following Inputs:         // create new Thing var params = { name: ThingName /* STRING */, description: undefined /* STRING */, thingTemplateName: "Timer" /* THINGTEMPLATENAME */, tags: undefined /* TAGS */ }; Resources["EntityServices"].CreateThing(params); // read initial configuration // result: INFOTABLE var configtable = Things[ThingName].GetConfigurationTable({tableName: "Settings"}); // update configuration with service parameters configtable.updateRate = updateRate configtable.runAsUser = user // set new configuration table var params = { configurationTable: configtable /* INFOTABLE */, persistent: true /* BOOLEAN */, tableName: "Settings" /* STRING */ }; Things[ThingName].SetConfigurationTable(params);   This code is an example which could also be used to create a new Scheduler. The configuration table for a Timer has the following attributes: updateRate enabled runAsUser The configuration table for a Scheduler has the following attributes: schedule enabled runAsUser  
View full tip
Original Post Date:            September 30, 2016   Description: This tutorial video will walk you through the installation process for the PostgreSQL-based version of the ThingWorx Platform (7.2) in a RHEL environment.  All required software components will be covered in this video.    
View full tip
The accuracy of a predictive model can be boosted in two ways: Either by embracing Feature engineering or by applying boosting algorithms straight away. There are multiple boosting algorithms like Gradient Boosting, XGBoost, AdaBoost, Gentle Boost etc. Every algorithm has its own underlying mathematics and a slight variation is observed while applying them. While working with boosting algorithms, we have come across two frequently occurring buzzwords: Bagging and Boosting. Bagging: It is an approach where you take random samples of data, build learning algorithms and take simple means to find bagging probabilities. Boosting: Boosting is similar, however the selection of sample is made more intelligently. We subsequently give more and more weight to hard to classify observations. Below are Default Algorithms used in Predictive Models generated in ThingWorx Analytics: Decision Tree Gradient Boost Linear regression Neural Net Random Forrest Logistic Regression Gradient boosting is a machine learning technique for regression and classification problems, which produces a prediction model in the form of an ensemble of weak prediction models, typically decision trees. It builds the model in a stage-wise fashion like other boosting methods do, and it generalizes them by allowing optimization of an arbitrary differential loss function. Let’s begin with an easy example: Assume, you are given a previous model M to improve on. Currently you observe that the model has an accuracy of 80% (any metric). How do you go further about it? One simple way is to build an entirely different model using new set of input variables and trying better ensemble learners. On the contrary, we have a much simpler way to suggest. It goes like this: Y = M(x) + error What if we are able to see that error is not a white noise but have same correlation with outcome(Y) value. What if we can develop a model on this error term? Like:error = G(x) + error2 Probably, we will see error rate will improve to a higher number, say 84%. Let’s take another step and regress against error2: error2 = H(x) + error3 Now we combine all these together: Y = M(x) + G(x) + H(x) + error3 This probably will have a accuracy of even more than 84%. What if we can find an optimal weights for each of the three learners: Y = alpha * M(x) + beta * G(x) + gamma * H(x) + error4 How Gradient Boosting Works: 1. Loss Function: The loss function used depends on the type of problem being solved. It must be differential , but many standard loss functions are supported and you can define your own. A benefit of the gradient boosting framework is that a new boosting algorithm does not have to be derived for each loss function that may want to be used, instead, it is a generic enough framework that any differential loss function can be used. 2. Weak Learner: Decision trees are used as the weak learner in gradient boosting. Specifically regression trees are used that output real values for splits and whose output can be added together, allowing subsequent models outputs to be added and “correct” the residuals in the predictions. Trees are constructed in a greedy manner, choosing the best split points based on purity scores like Gini or to minimize the loss. 3. Additive Model: Trees are added one at a time, and existing trees in the model are not changed. A gradient descent procedure is used to minimize the loss when adding trees. we have weak learner sub-models or more specifically decision trees. After calculating the loss, to perform the gradient descent procedure, we must add a tree to the model that reduces the loss. Improvements to Basic Gradient Boosting: 1. Tree Constraints: It is important that the weak learners have skill but remain weak. Below are some constraints that can be imposed on the construction of decision trees: Number of trees: ​Generally adding more trees to the model can be very slow to over fit. The advice is to keep adding trees until no further improvement is observed. Tree depth: Deeper trees are more complex trees and shorter trees are preferred. Generally, better results are seen with 4-8 levels. Number of nodes or number of leaves: like depth, this can constrain the size of the tree, but is not constrained to a symmetrical structure if other constraints are used. Number of observations per split: Imposes a minimum constraint on the amount of training data at a training node before a split can be considered Minimum improvement to loss: Is a constraint on the improvement of any split added to a tree. 2. Weighted Updates: The contribution of each tree to this sum can be weighted to slow down the learning by the algorithm. This weighting is called a shrinkage or a learning rate. "Each update is simply scaled by the value of the “learning rate parameter v". 3. Stochastic Gradient Boosting: At each iteration a sub sample of the training data is drawn at random (without replacement) from the full training data set. The randomly selected sub sample is then used, instead of the full sample, to fit the base learner. 4. Penalized Gradient Boosting: The additional regularization term helps to smooth the final learnt weights to avoid over-fitting. Intuitively, the regularized objective will tend to select a model employing simple and predictive functions.
View full tip
Announcements