Batch scoring will be used when a large amount of data needs to be scored.
To perform a batch scoring we will usually follow steps similar to the below ones:
Upload the historic data
Create a new model with this historic data
Upload new data – the one to be scored
Perform a prediction job to score those new data
Retrieve the prediction job result
Uploading the new data can be done in different ways.
If using a large amount of data, it can be easier to upload the data via a csv file in a similar way as the historic data. This is the way used in ThingWorx Analytics Builder.
If the amount of data is more limited this can be sent in the body of the scoring request. The post Analytics: Prediction Methods Mashupshows a good example of how to do this using the PredictionThing.BatchScore service.
We are focusing below on ThingWorx Analytics Builder, that is uploading new data via a csv file.
In order to perform the scoring job only on the new data in step 4 above, we need to be able to filter those added data. If the dataset has already suitable column/feature such as a timestamp for example, we can use this to score only new data after timestamp > newdate, assuming all data are in chronological order.
If the dataset has no such feature, we will have to add one beforehand when we first upload the historic data in step 1 above.
We often use a new column/feature named record_purpose to this effect.
So initial data can take a value of training for this record_purpose feature since they are used to create the initial model.
Then new added data to be scored can get any value that identify those rows only. It is important to note that this record_purpose feature needs to be set with the optType INFORMATIONAL so as to not be taken into account by the learning algorithms.
The video below shows those steps while using ThingWorx Analytics Builder
Real time scoring
Real time scoring is better suited for small amount of data.
The process for real time scoring can be done either via the Analytics Server PredictionThing RealTimeScore service or using the Analytics Manager framework.