Skip to main content
1-Visitor
October 30, 2015
Solved

Decimating large queries in streams

  • October 30, 2015
  • 3 replies
  • 2653 views

I have a very large data stream where a typical query of 1 week of data results in tens of thousands of entries. At first, I had an issue with the chart not displaying all the data within the date range but I remedied this by setting a high value in maxItems of the QueryStreamEntriesWithData service. Unfortunately, this results in slow-downs in the mashup.

I am wondering if there is a way to query entries in such a way that I can set some maxItems value and will ignore certain entries but will retain a meaningful display of data of that date range.

For example, if I had 50,000 entries within the date range of 10 days and a maxItems at 500, the query will skip 100th entry.

Thanks in advance!

Best answer by CK_14383687

Hello Alister,

First of all, avoid saving redundant data in the first place. This can be controlled to some extent by specifying thresholds for "Data change type: Value" on the property.

Then, if you still need to query every n-th entry or aggregate your data somehow, consider those ideas:

  1. Aggregate the data:
    1. On demand, when the user first runs a query service / opens a mashup
    2. On schedule (it's easy to do in TWX)
    3. As the data streams in (use a service to set the properties and inside that service compute some running average over the last N rows, etc.)
  2. Put some flag like "odd/even" on your data and query only for the entries which are "odd". Or if you store some hh:mm:ss on your thing, then you can query only those entries with ss = 0 and skip the rest.

I hope it makes sense. In general, try to think of how you'd do it in SQL -- the concept is similar.

/ Constantine

3 replies

1-Visitor
October 30, 2015

I think you will have to aggregate data on a separate Stream.

7-Bedrock
October 31, 2015

Hello Alister,

First of all, avoid saving redundant data in the first place. This can be controlled to some extent by specifying thresholds for "Data change type: Value" on the property.

Then, if you still need to query every n-th entry or aggregate your data somehow, consider those ideas:

  1. Aggregate the data:
    1. On demand, when the user first runs a query service / opens a mashup
    2. On schedule (it's easy to do in TWX)
    3. As the data streams in (use a service to set the properties and inside that service compute some running average over the last N rows, etc.)
  2. Put some flag like "odd/even" on your data and query only for the entries which are "odd". Or if you store some hh:mm:ss on your thing, then you can query only those entries with ss = 0 and skip the rest.

I hope it makes sense. In general, try to think of how you'd do it in SQL -- the concept is similar.

/ Constantine

1-Visitor
October 31, 2015

For the number 2. that Constantine proposed, you also can use Tags in order to filter data.