Solved: Best practice to store data in Stream

UN_10218888 · ‎Nov 23, 2022

I'm working on use case,

there are 20 Plants. each plant contains 100+ equipment with each equipment contain 5-6 properties.

now I want to log each property at one minute interval timestamp.

I am logging property by using stream.

shall I create 100+ stream for each equipment 1:1 ratio or 20 stream for 20 plants one stream for one plant.

database is PostgreSQL.

If I query historical data in future it should not affect on performance.

please suggest best way to do this.

VladimirRosu · ‎Nov 23, 2022

Hi @UN_10218888 ,

What you are asking here is usually addressed via an architecture workshop, and usually the answers received in this forum format can help reaching some decisions, but most probably not the best ones.

What we can offer as advice is relatively simple:

Never use PostgreSQL for this kind of historical information, instead use InfluxDB, as it is a proper time-series database
You mentioned Streams, but ThingWorx offers another object, called Value Streams for this type of historical data. Was there a reason why you could not use Value Streams here?
The biggest performance advantage is if you shard your data storage in different databases (Persistence Providers). Only in this way you could really separate the query load. In a single database they all end in the same SQL table (in PostgreSQL). Of course this results in more complex IT management, but it is a valid option.
Regarding querying historical data, the performance depends on what size of data you will query, how frequent and the complexity of your query. One thing you should always remember is that you should always provide a start and end date to the to the classic QueryPropertyHistory methods you should use (https://www.ptc.com/en/support/article/cs260742). This is highly recommended because otherwise the system will load in the platform memory all the value stream/stream data that you queried, and in some situations it can be quite high
You must decide what is the period of time for which you store data in a ValueStream/Stream. I say this because there's no system out there that can ingest data forever without requiring money/disk space. As a consequence you should either purge the data periodically or send it to a third party system for archival.

View solution in original post

VladimirRosu · ‎Nov 23, 2022

Hi @UN_10218888 ,

What you are asking here is usually addressed via an architecture workshop, and usually the answers received in this forum format can help reaching some decisions, but most probably not the best ones.

What we can offer as advice is relatively simple:

Never use PostgreSQL for this kind of historical information, instead use InfluxDB, as it is a proper time-series database
You mentioned Streams, but ThingWorx offers another object, called Value Streams for this type of historical data. Was there a reason why you could not use Value Streams here?
The biggest performance advantage is if you shard your data storage in different databases (Persistence Providers). Only in this way you could really separate the query load. In a single database they all end in the same SQL table (in PostgreSQL). Of course this results in more complex IT management, but it is a valid option.
Regarding querying historical data, the performance depends on what size of data you will query, how frequent and the complexity of your query. One thing you should always remember is that you should always provide a start and end date to the to the classic QueryPropertyHistory methods you should use (https://www.ptc.com/en/support/article/cs260742). This is highly recommended because otherwise the system will load in the platform memory all the value stream/stream data that you queried, and in some situations it can be quite high
You must decide what is the period of time for which you store data in a ValueStream/Stream. I say this because there's no system out there that can ingest data forever without requiring money/disk space. As a consequence you should either purge the data periodically or send it to a third party system for archival.