cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
Showing results for 
Search instead for 
Did you mean: 

Community Tip - Did you know you can set a signature that will be added to all your posts? Set it here! X

Decimating large queries in streams

apineda
1-Visitor

Decimating large queries in streams

I have a very large data stream where a typical query of 1 week of data results in tens of thousands of entries. At first, I had an issue with the chart not displaying all the data within the date range but I remedied this by setting a high value in maxItems of the QueryStreamEntriesWithData service. Unfortunately, this results in slow-downs in the mashup.

I am wondering if there is a way to query entries in such a way that I can set some maxItems value and will ignore certain entries but will retain a meaningful display of data of that date range.

For example, if I had 50,000 entries within the date range of 10 days and a maxItems at 500, the query will skip 100th entry.

Thanks in advance!

ACCEPTED SOLUTION

Accepted Solutions
ckulak
12-Amethyst
(To:apineda)

Hello Alister,

First of all, avoid saving redundant data in the first place. This can be controlled to some extent by specifying thresholds for "Data change type: Value" on the property.

Then, if you still need to query every n-th entry or aggregate your data somehow, consider those ideas:

  1. Aggregate the data:
    1. On demand, when the user first runs a query service / opens a mashup
    2. On schedule (it's easy to do in TWX)
    3. As the data streams in (use a service to set the properties and inside that service compute some running average over the last N rows, etc.)
  2. Put some flag like "odd/even" on your data and query only for the entries which are "odd". Or if you store some hh:mm:ss on your thing, then you can query only those entries with ss = 0 and skip the rest.

I hope it makes sense. In general, try to think of how you'd do it in SQL -- the concept is similar.

/ Constantine

View solution in original post

3 REPLIES 3

I think you will have to aggregate data on a separate Stream.

ckulak
12-Amethyst
(To:apineda)

Hello Alister,

First of all, avoid saving redundant data in the first place. This can be controlled to some extent by specifying thresholds for "Data change type: Value" on the property.

Then, if you still need to query every n-th entry or aggregate your data somehow, consider those ideas:

  1. Aggregate the data:
    1. On demand, when the user first runs a query service / opens a mashup
    2. On schedule (it's easy to do in TWX)
    3. As the data streams in (use a service to set the properties and inside that service compute some running average over the last N rows, etc.)
  2. Put some flag like "odd/even" on your data and query only for the entries which are "odd". Or if you store some hh:mm:ss on your thing, then you can query only those entries with ss = 0 and skip the rest.

I hope it makes sense. In general, try to think of how you'd do it in SQL -- the concept is similar.

/ Constantine

For the number 2. that Constantine proposed, you also can use Tags in order to filter data.

Announcements


Top Tags