How do Thingworx DataTables/Streams/Wikis/Blogs Work? Scaling with Data
After I wrote this in a response I figure I'd pull it out as a blog post.
Thingworx is designed to store all DataTable data in a single (underlying database H2/PostGres and yes even Cassandra) table structure. It has therefor limited indexing, I'm checking this in PostGres and Streams have an ID and TimeStamp, DataTables have an ID and some sort of composite index on the JSON Blob Values field. The rest of the information is stored as a JSON Blob (also true for Streams, Wiki and Blog entries)
This means that when you query from these data structures, for a Stream it first does a datetime filter, brings back the records, reconstructs the information in an actual infotable and then applies the remainder of your query if any. Not exactly sure how the DataTable index works
I believe the DataTable indexes may be kept on the side as Solar indexes - not sure since I see an Index on Values in the schema.
However it is true that the performance of DataTables isn't that good, Streams actually perform much much better and you can get reasonable performance out of streams that have millions of records.
DataTables have their limit not at 10K rows but more so in the 250-500K range.
Depending on how much data and especially what you need to do with it, an additional Relational Database (even another PostGres) and then using your own table design and a JDBC connection can speed up performance a LOT because you'll be leveraging another Server to do database side processing of your query before it comes back to Thingworx.
Re: How do Thingworx DataTables/Streams/Wikis/Blogs Work? Scaling with Data
To add additional information to this.
So you can tell how a Stream or DataTable is
Turns out that first datetime is filtered upon and is indexed
Besides that it will filter on the DataBase server for all those 5 default fields (even though not all are indexed) including StreamID which is the name of the stream.
Before bringing back the data and then filtering on the DataShape content.
This means that you potentially could be faster by executing a call based on timestamp plus on of the 5 fields mentioned. A highly recommended approach for Event based data would be Source and DataTags.