cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
Showing results for 
Search instead for 
Did you mean: 

Community Tip - When posting, your subject should be specific and summarize your question. Here are some additional tips on asking a great question. X

When to split a stream in two or more streams

Willie
16-Pearl

When to split a stream in two or more streams

I have a stream that has about 40 fields.  The data in each field is unique so the stream cannot be normalized.  About 2000 records are created each month.  Most of the time, when the stream is queried, not all of the fields are needed.  Should the stream be split into two or more streams?  Will this have any noticeable improvement in performance?  When performing queries, my understanding is that we cannot select the fields to be outputted as in the example below:

https://bi-bigdata.com/2012/09/02/select-vs-select-in-sql-server-query

 

 

10 REPLIES 10
PaiChung
22-Sapphire I
(To:Willie)

Assuming you are using a regular Stream vs. ValueStream, I don't think Stream supports querying only specified columns unlike ValueStreams.

If you know that you need certain fields more often than others, splitting can help, especially if you use multiple persistence providers. 

ie one Stream is supported by schema "A" and the other Stream is supported by schema "B"

First advantage though is less total data retrieved and transferred, but if you use two or more persistence providers than the overall table size of the stored stream reduces significantly.

Would there be any significant performance improvement if there is only one persistence provider?  Would the change in performance be almost negligible?  Is 40 fields a lot for a stream?  There are times when all of the fields need to be displayed.

PaiChung
22-Sapphire I
(To:Willie)

If you use one persistence provider, that means all the records will still reside in one and the same table. There is limited indexing on Streams so the advantage will be reduced to nil. I can't say for sure, you would have to test it.

When you do need all the fields, as long as you have a field to allow joining, you can use the very fast in memory infotable service operations to put the table(s) together. The only part that you would have to be aware of  is the amount of data it will represent that needs to be transferred from Server to Client and then displayed.

40 fields isn't necessarily a lot, but if you only need 10 fields all the time and 40 once a month, splitting would make sense. The other part to the calculation would be number of rows.

 

Note that setting up another persistence provider doesn't mean a separate Database server, you can just create a second schema within the same Database server.

I've been using ThingworxPersistenceProvider for everything.  Is this bad practice?  Should there be several persistence providers?  

Willie
16-Pearl
(To:Willie)

Okay, makes sense.  Thank you.


@PaiChung wrote:

If you use one persistence provider, that means all the records will still reside in one and the same table.


Does this mean data is stored in one table with millions of columns when stored in a persistence provider?  

 

We have Tomcat and Postgres on the same machine.  Should they be on separate machines?  Would having them on the same machine affect performance?

 

PaiChung
22-Sapphire I
(To:Willie)

It's millions of rows over time yes.

Streams are basically StreamName/TimeStamp/JSONObject (describes the data according to data shape with value)

 

For production environments, best practice is to indeed have the database server on a separate server.

Okay, thank you for the clarification.

I understand what streams, value streams, and data tables are.  In your reply, I thought you meant that all stream data are stored in one giant table with millions of columns, meaning a combination of different streams in one giant table with the timestamp as the key in the persistence provider.

 


@PaiChung wrote:

For production environments, best practice is to indeed have the database server on a separate server.

Okay, PTC cloud service put our Tomcat and Postgres on the same VM.  Been trying to figure out why the performance on our servers have been sluggish and how we can improve it.

Willie
16-Pearl
(To:Willie)

Yes, I meant columns and not rows.  

 


@Anonymous wrote:

Regardless, the underlying SQL tables do not have a correspondence of 1:1 with the columns you define in ThingWorx.


Okay, so a field in a datashape does not correspond to a field in an SQL table in the persistence provider.

 

I'll take a look at the deployment architecture guide.

slangley
23-Emerald II
(To:Willie)

Hi @Willie.

 

If you feel your question has been answered, please mark the appropriate response as the Accepted Solution for the benefit of others with the same question.

 

Regards.

 

--Sharon

Top Tags