Skip to main content
10-Marble
April 12, 2024
Solved

Entity ID in time series scoring

  • April 12, 2024
  • 1 reply
  • 2045 views

In this article https://community.ptc.com/t5/IoT-Tips/Considerations-for-Handling-Time-Series-Data/ta-p/818763 entity id is defined as this:

 

“ENTITY_ID”, [is] the identifier for an entity, such as a machine serial number. The ENTITY_ID field should remain the same as long as there are no missing timestamps and it is within the same asset but should be different for different assets or asset runs in order to accurately assign history during model training and scoring."

"If there are gaps in the time series data, it is recommended to restart the series after the gap as a new entity."

 

This makes perfect sense to me, in order to avoid mixing training data from different machines or different runs you should separate the dataset with the entity id label. In my case, I have only one machine/system, but several different runs spanning a big time window. I would therefore assign a different entity id for each of this runs.

 

My doubt comes when asking for predictions. The dataset for scoring needs to include an entity id, this makes total sense when the entity id is separating between different assets, it's basically another feature/label. Now for my case, which entity id should I pass for scoring?

 

For example, if I have data from 3 runs on 3 different days with a big gap of time between them. In the training dataset I need to assign an entity id for each one, lets say: run1, run2, run3. Now when scoring in the future, which entity id should I use? run1, run2 or run3? Why would I choose one over the other if they were only separated in order to avoid mixing runs?

Best answer by Rocko

I haven't tried it, but i think that in this case the Entity_ID can be the same for all sets, because you won't have timestamp collisions. ENTITY_ID makes the records unique in case of multiple machines being logged at the same time. Maybe you can try using a constant entity_id in training (or leaving it away in the first place).

If that doesn't work because the gaps are too large, there should be no difference in which Entity ID you pass in for scoring: You trained one model which should be valid for all the machines/series, not one model per machine/series, all packed into one. Hence it shouldn't make a difference - would be my assumption. But that's also quick to try out if you already have the model.

1 reply

Rocko
Rocko19-TanzaniteAnswer
19-Tanzanite
April 15, 2024

I haven't tried it, but i think that in this case the Entity_ID can be the same for all sets, because you won't have timestamp collisions. ENTITY_ID makes the records unique in case of multiple machines being logged at the same time. Maybe you can try using a constant entity_id in training (or leaving it away in the first place).

If that doesn't work because the gaps are too large, there should be no difference in which Entity ID you pass in for scoring: You trained one model which should be valid for all the machines/series, not one model per machine/series, all packed into one. Hence it shouldn't make a difference - would be my assumption. But that's also quick to try out if you already have the model.

10-Marble
April 15, 2024

Hi @Rocko thanks for your answer. I just tried a couple of things. The entity id must be present and unique for every run if there are gaps in time. If I set a constant entity id it fails to train warning that the time sampling is not consistent with what was configured in the dataset import.

 

"there should be no difference in which Entity ID you pass in for scoring" I would think so too, but confirmation would be great. You also say it'd be quick to try, how do you suggest I go about that? I could score trying different entity ids for the same data and model, but how will I know if it makes a difference or not? I'm must be missing something.

Rocko
19-Tanzanite
April 16, 2024

Well maybe I'm missing something 🙂 I would've tried to score the same sample data with two different entity ids and see if there is a difference in the goal value. If not, the entity_id didn't make a difference.