Community Tip - When posting, your subject should be specific and summarize your question. Here are some additional tips on asking a great question. X
I have a stream containing "alarms"
the source field is the name of the equipment who generated the alarm, like MACHINE1 MACHINE2 ....
I need to query entries for all records with two or more source names (not just one)
I Use QueryStreamEntriesWithData, keeping source parameter as undefined and using a query with the "IN" syntax, like {source IN array_of_elements} (where each element is a machine name).
This query works ... but it is slow if I have millions of rows and searching for many months (I also use time range obviously, and limit result to 2000 records).
is the source field indexed ? is this query executed at the database or thingworx layer ?
is there a faster way ?
It is important to use (some of) the following parameters : source, startDate, endDate, sourceTags or Tags when querying large streams with QueryStreamEntriesWithData.
The query filter is always applied in-memory on the platform side.
For example :
var query = {
"filters": {
"type": "EQ",
"fieldName": "name",
"value": "aa"
}
};
var result = Things["myStream"].QueryStreamEntriesWithData({query: query});
The code above is fetching the entire stream from the DB, the filter is then applied on the platform in-memory.
Impacted services : QueryDataTableEntries, QueryStreamData, QueryStreamEntries, QueryStreamEntriesWithData
Note that maxItems is also applied in-memory when used in conjunction with the query filter :
var query = {...};
var result = Things["myStream"].QueryStreamEntriesWithData({maxItems: 1, query: query});
This code is also fetching the entire stream from the DB
In appendix to this answer, I have 2 suggestions that might help.
1. Put your Stream data in InfluxDB (the new persistence provider after 8.4)
The InfluxDB is designed for timeseries data processing, so for Streams, the InfluxDB query speed is about 3 times faster than pgSQL. While since your Index filtering is done in TWX platform side, you'll just get a small improve on the overall process.
2. Give up using Stream and put the data in External JDBC database instead.
In this way, you can store more data, and use StoredProcedure to speed up the search.
The database would operate much faster than TWX platform in Filtering, and when you have a large size of Data, it's recommanded not to save in your primary Persistence provider (it will slow your platform speed, and oversized Stream table may cause a index overflow, it would require a Vacuum process and TWX stop working by the time.)
Hello,
The source field is indexed, but this index is only used when you use the corresponding parameter. Anything that is specified as query is applied in ThingWorx after the result set is fetched from the database.
For querying by multiple sources a series of queries (one per filtered source) + merging results might still work faster than using query parameter (depends on the number of queries of course).
/ Constantine
Thank you all !
so a summary on where filtering is done may be this ?
var params = {
oldestFirst: undefined /* BOOLEAN */, // ??? sort done at DB level or platform ?
maxItems: undefined /* NUMBER */, // done ad DB if query is not used, differentry at platform level
sourceTags: undefined /* TAGS */, // ??? dont know ...
endDate: undefined /* DATETIME */, // INDEXED, filtering done on DB
query: undefined /* QUERY */, // WARNING: done ad platform level (slower)
source: undefined /* STRING */, // INDEXED, filtering done on DB
startDate: undefined /* DATETIME */, // INDEXED, filtering done on DB
tags: undefined /* TAGS */ // ??? dont know ...
};
var result = me.QueryStreamEntriesWithData(params);
I'll try what suggested by Costantine.
Note that maxItems is "performed" on the DB if you don't use the query parameter.
Yes thanks, I updated the post
It may depend on the persistence provider, but in 8.4.5 + Postgresql :