Entity ID in time series scoring

Question

In this article https://community.ptc.com/t5/IoT-Tips/Considerations-for-Handling-Time-Series-Data/ta-p/818763 entity id is defined as this:

“ENTITY_ID”, [is] the identifier for an entity, such as a machine serial number. The ENTITY_ID field should remain the same as long as there are no missing timestamps and it is within the same asset but should be different for different assets or asset runs in order to accurately assign history during model training and scoring."

"If there are gaps in the time series data, it is recommended to restart the series after the gap as a new entity."

This makes perfect sense to me, in order to avoid mixing training data from different machines or different runs you should separate the dataset with the entity id label. In my case, I have only one machine/system, but several different runs spanning a big time window. I would therefore assign a different entity id for each of this runs.

My doubt comes when asking for predictions. The dataset for scoring needs to include an entity id, this makes total sense when the entity id is separating between different assets, it's basically another feature/label. Now for my case, which entity id should I pass for scoring?

For example, if I have data from 3 runs on 3 different days with a big gap of time between them. In the training dataset I need to assign an entity id for each one, lets say: run1, run2, run3. Now when scoring in the future, which entity id should I use? run1, run2 or run3? Why would I choose one over the other if they were only separated in order to avoid mixing runs?

Rocko · Accepted Answer

I haven't tried it, but i think that in this case the Entity_ID can be the same for all sets, because you won't have timestamp collisions. ENTITY_ID makes the records unique in case of multiple machines being logged at the same time. Maybe you can try using a constant entity_id in training (or leaving it away in the first place).If that doesn't work because the gaps are too large, there should be no difference in which Entity ID you pass in for scoring: You trained one model which should be valid for all the machines/series, not one model per machine/series, all packed into one. Hence it shouldn't make a difference - would be my assumption. But that's also quick to try out if you already have the model.

Sign up

Please use your PTC eSupport account.

Welcome to the PTC Community

Please use your PTC eSupport account.

Scanning file for viruses.

This file cannot be downloaded