Archiving Data Table records

jrumpza · ‎Oct 29, 2020

I would like to archive some Data Table records we have stored in ThingWorx (with a PostgreSQL Persistence Provider). I don't have much experience with setting up/optimizing PostgreSQL databases, so I am looking for some input. Note that we have both an application server and a separate database server which also has a PostgreSQL persistence provider set up. The records in this table are currently stored using that external persistence provider on the database server, not the application server.

Old records would not need to be regularly accessed and I want to offload them to clean things up a bit and reduce the amount of records in this particular Data Table. The way I see it there are 3 ways to do this:

Add the records to a separate Data Table which also stores data using the external persistence provider
Set up a separate database/table in PostgreSQL (not part of the ThingWorx data model), and create SQL queries using a JDBC connection to retrieve records as needed
Add the records to a flat file stored on the server

With option 1, would it improve query time if they are still in the ThingWorx data model? I know in the ThingWorx PostgreSQL database model, all data table entries are stored in the data_table table. Thus, if I simply move records from one Data Table to another, I am not sure if that would truly help query performance.

With option 2, if I set up a separate database in the same PostgreSQL installation, would there be any processing draw away from the main ThingWorx Data Table transactions? Assume that these records would hardly ever (perhaps even never) be accessed – if there are a large number (10s of thousands) of records sitting there in a separate database and are not updated or even queried, will PostgreSQL require server resources to maintain them, or is it only when they are queried?

I assume option 3 should definitely free up processing power, but it would be more annoying in case we do want to access the records in a nice way (in a mashup, for instance). So if option 2 and option 3 have the same net result on performance (again, assuming I am hardly ever touching the records), I would want to go with option 2.

Any inputs appreciated! Thank you!

Ciprian-Traian · ‎Oct 31, 2020

Hello @jrumpza,

Have you checked the Application Development Guide to see options regarding the storage of data, best practices for handling data-centric modeling
and determining the correct datastore option:

If you have data that will not change or will be overwritten the next time it is changed/loaded, and it is associated with a Thing, create an infotable property for that Thing and assign a proper data shape.
In this way, you can access the data through the Thing. You can also use configuration tables,or for larger amounts of data, use a data table
Use data caching as much as possible.
For example, instead of querying the database on each Data Change event, implement a cache, as an infotable, that is refreshed at set intervals.
Archive the data that you no longer need.

While designing your application, you must decide what data is frequently used.
You can store this data in the application database.
Move the old data as soon as possible to an external server, such as a ThingWorx federated instance or a database server.
Provide start and end date parameters to the query methods to limit the amount of data that the query retrieves.
This reduces the processing time and improves performance.
Ensure that your Data Tables have less than 100,000 rows.
Querying data from Data Tables and Streams should only take a few seconds.
If these Data Tables and Streams have more than 100,000 rows, the queries perform slowly.
Decide how much data you need to store. If data can be stored in small data sets (around 100,000 records or less), use the ThingWorx platform storage.
Decide what kind of data your application has.
How frequently you query your data affects the performance of the application. Depending on how frequently you want to query your data, it is recommended to use either of the following options:
To query data frequently, use a JDBC connector to query the database. You can use this JDBC connector with PostgreSQL.
If you do not need to retrieve historical data very frequently, you can move all data to a separate ThingWorx instance (in a Data Table or Stream) or to a database server.

Hope it helps,

jrumpza · ‎Nov 02, 2020

I have seen that guide. It provides some help, but in my case I am looking for more clarity in how to best "Archive the data that you no longer need" - whether that means in a separate Data Table, a separate database in the same PostgreSQL instance, or simply a .csv file.

abhiramk · ‎Nov 02, 2020

Hello @jrumpza

If there are more entries, I suggest better to avoid data_tables, because irrespective of data_tables you create, everything will be stored in data_table table in postgres which will degrade the query performance.

Option one won't be a good one, if you have frequent updates and queries.

Option2 can be good option, you can create new table than separate db and add indexes to table based on your usage. You can use JDBC connector to update and query entries using indexed columns. Will be efficient and make sure you vacuum table if there are frequent update/deletes based on your usage.

Option 3 won't be a good option, keeping in flat file structure, for updating and insertion you need to parse and then you to do conversion.

Keeping Infotables also good option, if there are less entries. You can create a persistent infotable property and update/ read it from it. You an also keep it in separate properties for different types and access it based on your usage. Also you can make property as not persistent and read it from server and keep it in Infotable property on Thing start-up event. On property update event, write it to DB.

I hope this will help you.

-regards,

Abhiram

jrumpza · ‎Nov 02, 2020

Thanks @abhiramk - I was thinking the same about option 1 since it is all stored in the same data_table table in Postgres. It's good to have that confirmation.

I will plan on going for option 2 and setting up a new table in Postgres. I have always assumed when doing so I should use a new database in Postgres, rather than the thingworx database which holds the ThingWorx model/data. Is this a fair assumption? Or is it fine to create a new table within the thingworx Postgres database?

abhiramk · ‎Nov 03, 2020

You can create a table in thingworx DB, but you need to take care of it when you're upgrading your thingworx version as well as moving from one instance to other.