Skip to main content
16-Pearl
February 15, 2021
Question

When to split a stream in two or more streams

  • February 15, 2021
  • 2 replies
  • 2646 views

I have a stream that has about 40 fields.  The data in each field is unique so the stream cannot be normalized.  About 2000 records are created each month.  Most of the time, when the stream is queried, not all of the fields are needed.  Should the stream be split into two or more streams?  Will this have any noticeable improvement in performance?  When performing queries, my understanding is that we cannot select the fields to be outputted as in the example below:

https://bi-bigdata.com/2012/09/02/select-vs-select-in-sql-server-query

 

 

2 replies

22-Sapphire I
February 15, 2021

Assuming you are using a regular Stream vs. ValueStream, I don't think Stream supports querying only specified columns unlike ValueStreams.

If you know that you need certain fields more often than others, splitting can help, especially if you use multiple persistence providers. 

ie one Stream is supported by schema "A" and the other Stream is supported by schema "B"

First advantage though is less total data retrieved and transferred, but if you use two or more persistence providers than the overall table size of the stored stream reduces significantly.

Willie16-PearlAuthor
16-Pearl
February 15, 2021

Would there be any significant performance improvement if there is only one persistence provider?  Would the change in performance be almost negligible?  Is 40 fields a lot for a stream?  There are times when all of the fields need to be displayed.

22-Sapphire I
February 15, 2021

If you use one persistence provider, that means all the records will still reside in one and the same table. There is limited indexing on Streams so the advantage will be reduced to nil. I can't say for sure, you would have to test it.

When you do need all the fields, as long as you have a field to allow joining, you can use the very fast in memory infotable service operations to put the table(s) together. The only part that you would have to be aware of  is the amount of data it will represent that needs to be transferred from Server to Client and then displayed.

40 fields isn't necessarily a lot, but if you only need 10 fields all the time and 40 once a month, splitting would make sense. The other part to the calculation would be number of rows.

 

Note that setting up another persistence provider doesn't mean a separate Database server, you can just create a second schema within the same Database server.

Support
March 9, 2021

Hi @Willie.

 

If you feel your question has been answered, please mark the appropriate response as the Accepted Solution for the benefit of others with the same question.

 

Regards.

 

--Sharon