Considerations for Handling Time Series Data

Yes

Time series prediction uses a model to predict future values based on previously observed values. Time series data differs somewhat from non-time series data in both the formatting of the data and the training of predictive models. This article will highlight several important considerations when dealing with time series data.

Preparing Time Series Data:

The data must contain exactly one field with Op Type “TEMPORAL” and one field with Op Type “ENTITY_ID”, which defines the identifier for an entity, such as a machine serial number. The ENTITY_ID field should remain the same as long as there are no missing timestamps and it is within the same asset but should be different for different assets or asset runs in order to accurately assign history during model training and scoring.

The TEMPORAL field is a numeric field indicating the order of the data rows for a specific entity . One should also ensure that data is formatted such that the timestamps are equally spaced (for example, one data point every minute) and that no gaps exist in the sequence of numbers.

If there are gaps in the time series data, it is recommended to restart the series after the gap as a new entity. Alternatively, if the gap is small enough (few data points), linear interpolation based on the gap endpoint values within the same entity is generally acceptable.

Model Creation in Time Series:

When creating a timeseries model in Analytics Builder, you will be asked to specify a lookback size and lookahead parameter. The lookback size determines how many historical datapoints (including the current row) will be used in the model. The lookahead indicates how many time steps ahead to predict. If the value of the goal variable is not known at time of scoring, unchecking Use Goal History will use the goal column during training but not its history during scoring.

Time Series models can also be created in Services using the Training Thing. The lookback size and lookahead parameter are specified in the CreateJob service. The virtualSensor field is used to indicate if the model should be trained to predict values for a field that will not be available during scoring. For example, one can train a time series model to predict Volume using evolving Temperature and Pressure, based on sensor data for these three variables over a period of time. However, the Volume sensor may be removed from further assets in order to reduce costs, and the predictive model can be used instead.

Two important considerations:

ThingWorx Analytics will expand historical data in the time series into new columns. This process creates new features using the values of the previous time steps. Additionally, low order derivatives, together with average and standard deviation features are computed over small contiguous subgroups of the historical data.

The expansion process can make the dataset exceptionally wide, so time series training is generally significantly slower compared to training with no history on the same dataset. This gets exacerbated when lookback size = 0 (auto-windowing, a process where the system is trying to find the optimal lookback). If there are columns that are not changing or change infrequently (such as a device serial number or zip code of the device’s location), these should be marked as Static when importing the data. Any columns labeled Static will not be expanded to create new features. Care also needs to be taken to exclude any features that are known to not be relevant to the prediction.

Using a large lookback can eliminate how many examples / entities the model has available to train. For example, if a lookback of 8 is used, then any entities that have less than 8 examples will not be used in training. For the same reason, scoring for time series produces less results than the number of rows provided as input: if 10 rows are provided and lookback is 6, then only 5 predictions will be produced.