Analytics Models Version Control

Yes

With the advent of faster and cheaper hardware, and owing to vast improvements in connectivity such as Gigabit networks and 5G, data is collected and processed at an ever increasing pace. As such, there is a related need for the Analytical Models built on top of such data to evolve over time. As part of this evolution, modelling experiments with more data, new variables, new techniques, and different training parameters will be performed. The resulting models need to be tracked, monitored, and appropriate versions need to be used for scoring in various scenarios. This creates the need for version control of Analytical Models.

ThingWorx Analytics makes it easy to implement an “Analytics Model Version Control” system via two mechanisms:

A job Id and timestamp based identification system, so if a model with the same jobName (model name) is retrained, the old model will still exist and can be easily retrieved
A tagging mechanism for the above jobs

Before using the above techniques to version control Analytics Models, it is critical to decide what represents “the same model” to be versioned. Unlike in the world of Software Engineering where the concept to be versioned is a file / folder that evolves over time, in the world of Analytics Models, there could be several, potentially customer specific definitions. For example, one definition could be “same training parameters, but trained on the latest, most comprehensive data available”. Yet another, more relaxed definition could be “any training parameters, but same training dataset and goal” or even “any training parameters, on any historical version of the training dataset”.

Once this concept is agreed upon within the customer's organization, and if training is done in a ThingWorx service by calling the APIs, for a given jobName (model name) one can simply query the tags for a LatestVersion type tag, increment, and create the new model with the same jobName and incremented tag. Any model version with the same jobName and its corresponding performance metrics can then be accessed using the tag. Additional tags (such as techniques used, dataset version, etc) can be added if desired to make retrieval of context dependent models more efficient.