Re: When to split a stream in two or more streams

Willie · ‎Feb 15, 2021

I have a stream that has about 40 fields. The data in each field is unique so the stream cannot be normalized. About 2000 records are created each month. Most of the time, when the stream is queried, not all of the fields are needed. Should the stream be split into two or more streams? Will this have any noticeable improvement in performance? When performing queries, my understanding is that we cannot select the fields to be outputted as in the example below:

https://bi-bigdata.com/2012/09/02/select-vs-select-in-sql-server-query

PaiChung · ‎Feb 15, 2021

Assuming you are using a regular Stream vs. ValueStream, I don't think Stream supports querying only specified columns unlike ValueStreams.

If you know that you need certain fields more often than others, splitting can help, especially if you use multiple persistence providers.

ie one Stream is supported by schema "A" and the other Stream is supported by schema "B"

First advantage though is less total data retrieved and transferred, but if you use two or more persistence providers than the overall table size of the stored stream reduces significantly.

Willie · ‎Feb 15, 2021

Would there be any significant performance improvement if there is only one persistence provider? Would the change in performance be almost negligible? Is 40 fields a lot for a stream? There are times when all of the fields need to be displayed.

PaiChung · ‎Feb 15, 2021

If you use one persistence provider, that means all the records will still reside in one and the same table. There is limited indexing on Streams so the advantage will be reduced to nil. I can't say for sure, you would have to test it.

When you do need all the fields, as long as you have a field to allow joining, you can use the very fast in memory infotable service operations to put the table(s) together. The only part that you would have to be aware of is the amount of data it will represent that needs to be transferred from Server to Client and then displayed.

40 fields isn't necessarily a lot, but if you only need 10 fields all the time and 40 once a month, splitting would make sense. The other part to the calculation would be number of rows.

Note that setting up another persistence provider doesn't mean a separate Database server, you can just create a second schema within the same Database server.

Willie · ‎Feb 15, 2021

I've been using ThingworxPersistenceProvider for everything. Is this bad practice? Should there be several persistence providers?

Willie · ‎Feb 22, 2021

Okay, makes sense. Thank you.

Willie · ‎Feb 22, 2021

@PaiChung wrote:

If you use one persistence provider, that means all the records will still reside in one and the same table.

Does this mean data is stored in one table with millions of columns when stored in a persistence provider?

We have Tomcat and Postgres on the same machine. Should they be on separate machines? Would having them on the same machine affect performance?

PaiChung · ‎Feb 22, 2021

It's millions of rows over time yes.

Streams are basically StreamName/TimeStamp/JSONObject (describes the data according to data shape with value)

For production environments, best practice is to indeed have the database server on a separate server.

Willie · ‎Feb 22, 2021

Okay, thank you for the clarification.

I understand what streams, value streams, and data tables are. In your reply, I thought you meant that all stream data are stored in one giant table with millions of columns, meaning a combination of different streams in one giant table with the timestamp as the key in the persistence provider.

@PaiChung wrote:
For production environments, best practice is to indeed have the database server on a separate server.

Okay, PTC cloud service put our Tomcat and Postgres on the same VM. Been trying to figure out why the performance on our servers have been sluggish and how we can improve it.

Willie · ‎Feb 22, 2021

Yes, I meant columns and not rows.

@Anonymous wrote:

Regardless, the underlying SQL tables do not have a correspondence of 1:1 with the columns you define in ThingWorx.

Okay, so a field in a datashape does not correspond to a field in an SQL table in the persistence provider.

I'll take a look at the deployment architecture guide.

slangley · ‎Mar 09, 2021

Hi @Willie.

If you feel your question has been answered, please mark the appropriate response as the Accepted Solution for the benefit of others with the same question.

Regards.

--Sharon