Community Tip - Did you get called away in the middle of writing a post? Don't worry you can find your unfinished post later in the Drafts section of your profile page. X
Hi,
I have question about the query performance for Data Table and Stream.
In PostgreSQL, the Data Table and Stream has the following table structure:
DataTable | Stream | |
Columns | entry_id|entity_id|entity_key|source_id|source_type|tags|time|field_values|full_text|location | entry_id|entity_id|source_id|source_type|tags|time|field_values|full_text|location |
Index | entity_key field_values |
(entity_id, time) entity_id |
Constraints | entry_id (entity_id, entity_key) |
(entity_id,source_id,time,source_type) |
According to the comparing above , the questions are :
Regards,
Sean
Solved! Go to Solution.
Thank you for your question. First, a quick discussion about various use cases for these storage methods:
Data tables are generally meant for static data that changes infrequently (like serial numbers).
Streams and value streams are meant for ingesting extremely quickly updating data (e.g. raw device data). Because Streams and Value Streams are optimized for ingestion, they have been designed with fewer indexes in place. Updating indexes significantly slows down ingestion, and the design for these Stream tables is meant to absorb tens of thousands of data points per second. In our testing we have found that adding additional indexes (even if it theoretically helps with fetching data) significantly slows down table writes and potentially leads to index collisions when multiple transactions attempt writes at the same time.
In terms of your question for Data Table query performance, there are a couple of notes:
1. Speaking of indexes, we do find that there is an index missing on Data Tables that does significantly boost performance. Please see: https://www.ptc.com/en/support/article?n=CS261063
2. With the index above in place, we do have customers with 200-300k entries in their Data Tables querying this data successfully (note: this is mostly static data)
3. Data Tables (as of current releases of ThingWorx) are not thread safe for writing/updating rows. Again, these were designed for slower operations and if we perform DT updates during ingestion, we can easily cause threading/update issues.
4. Streams were designed with multi-threaded inserts/updates in mind
5. If we always query by Key or add the index in #2, we can remain performant with 2-3x as much data as documented. However, if the desire is to maintain 0.5 million rows, then we'll likely be better off designing our own storage structure (you can configure JDBC connections to easily manipulate your own storage with all the needed indexes).
Please let me know if I can clarify any of the points mentioned above.
Thank you for your question. First, a quick discussion about various use cases for these storage methods:
Data tables are generally meant for static data that changes infrequently (like serial numbers).
Streams and value streams are meant for ingesting extremely quickly updating data (e.g. raw device data). Because Streams and Value Streams are optimized for ingestion, they have been designed with fewer indexes in place. Updating indexes significantly slows down ingestion, and the design for these Stream tables is meant to absorb tens of thousands of data points per second. In our testing we have found that adding additional indexes (even if it theoretically helps with fetching data) significantly slows down table writes and potentially leads to index collisions when multiple transactions attempt writes at the same time.
In terms of your question for Data Table query performance, there are a couple of notes:
1. Speaking of indexes, we do find that there is an index missing on Data Tables that does significantly boost performance. Please see: https://www.ptc.com/en/support/article?n=CS261063
2. With the index above in place, we do have customers with 200-300k entries in their Data Tables querying this data successfully (note: this is mostly static data)
3. Data Tables (as of current releases of ThingWorx) are not thread safe for writing/updating rows. Again, these were designed for slower operations and if we perform DT updates during ingestion, we can easily cause threading/update issues.
4. Streams were designed with multi-threaded inserts/updates in mind
5. If we always query by Key or add the index in #2, we can remain performant with 2-3x as much data as documented. However, if the desire is to maintain 0.5 million rows, then we'll likely be better off designing our own storage structure (you can configure JDBC connections to easily manipulate your own storage with all the needed indexes).
Please let me know if I can clarify any of the points mentioned above.
Adding some information to Tudor's post:
The Stream and ValueStream are time-series data, so they are better sorted than Datatable, the search could be faster. Especially in 8.4 we introduced InfluxDB as the additional Persistence Package for Time-series data storage, and it's proved that searching with InfluxDB will be a few times faster than PostgreSQL, so for sure it will be faster than using Datatable.
Agreed: Stream/Value Stream queries are faster if you need to find time series data, but I don't believe this is the use case here. If you're looking up relational data (time insensitive), the lack of indexing will degrade performance compared to data tables.
@Tudor ,
So,
for Stream , the query performance is good if set query condition on time ONLY, but should avoid to query on the other properties as they're not indexed .
for DataTable, after fixing https://www.ptc.com/en/support/article?n=CS261063, the query performance is good to query any property as the JSONB field field_values is indexed (less than 300,000 records), but should avoid to write to DataTable in high concurrency.
is it Right ?
@seanccc -- your understanding is correct. The more we can leverage indexes already in the DB, the better our performance.
Another thing to speedup DataTable Queries it's to use "values" parameter instead of "query" parameter, actually (I'm not 100% sure if it's with all TW realeases) .
You can add custom indexes on TW DataTables through DataTable > Configuration > Index Settings, then in order to use this indexes you MUST use "values" parameter instead of "query" parameter. This has the limitation that you can only query for exactly the values on the index but it speeds up a lot queries.
Sample of querying by index fields (two fields on the same index):
// -- you defined myIndexedField1,myIndexedField2 as compound index on myDataTable var values = Resources["InfoTableFunctions"].CreateInfoTable();; values.AddField({ name: "myIndexedField1", baseType:"STRING" }); values.AddField({ name: "myIndexedField2", baseType:"STRING" }); values.AddRow({ myIndexedField1: valueToQueryIndexedField1, myIndexedField2: valueToQueryIndexedField2, }); myDataTable.QueryDataTableEntries({ values: values, source: something or undefined, query: something* or undefined }); * on this case first it will filter by values and source (both indexed) and then with the results it will apply the query parameter