influxdb schema

atondorf · ‎Sep 30, 2020

Dear Zar, dear Community, dear PTC,

currently I am playing around with Thingworx and InfluxDB and really early in this process I faced the same question ... and before you start asking for more details, here they are:

Thingworx support the influxDB as a PersistanceProvider for ValueStreams and Streams!

Both have a different functionality and usage.
(Where the usage and the HowTo of streams is still unclear to me. Documentation here is also not very informative and stops after “how to create a stream”, no info on “how to use” it.
As most other PTCs documentation it stops when it comes to the interesting fun part ... ☹).

So, let’s have a short look at InfluxDB main schema objects. There are:

database self-explaining …
measurement it’s a kind a like a table, that contains timestamps and values
field it’s a kind a like a column in the measurement table
- stores the actual values of a time series,
- Important fact: they are not indexed - queries on field values scan all points that match the specified time range and, as a result, are not performant.
tag also a kind of a column in the measurement table, but different as:
- Tags are an optional part of the structure, they can store additional metadata
- Tags are indexed so queries using filters and groups on tags are performant

Now let’s have a look on how Thingworx maps it’s data to the schema for value streams.

database is defined in the config of the persistence provider.
measurement is the name of the thing that uses the value stream
so there is one measure for each thing that logs properties
field is the name or the thing properties,
so there is one field for each property that is logged
tag not really used in thingworx, only the name of the valuestream here

For streams this is a little bit different:

measurement is the name of the Stream
field is defined by the datashape of the stream,
there is one field in Influx per field in the datashape
- There are two additional fields, “locations” & “tags”
- Streams also have a tagging feature to add metadata, but it stores this in fields!
tag a stream defines two tags in InfluxDB, “sourename” & “sourcetype”
- These values are just parameters of the service "Stream.AddStreamEntry()".
- When data is stored using the service "GenericThing.WritePropertiesToStream()"
  they are set to sourename=”[name of thing]” & “sourcetype=”thing”

So Thingworx makes some use of InfluxDB tags, but not in a way, that could be useful for users of the raw data in influx. The SELECT statement of InfluxDB allows WHERE clause & GROUP BY clause, but without TAGS it makes no sense to use it.

An example:
Let’s assume a real to real process. The machine is represented by a thing with many different properties representing process parameters, that may quickly change like:

Speed, Status, Length of reels
Currents, Powerconsumption
Pressures, Temperatures …
Setpoints, Target Values …
Etc …

Additionally, the thing has some “none process information”, that do not or only seldom change like:

Machine ID, Location of the
OrderNr, CustomerNr, …
ReelNr, MaterialNr, …
ShiftId, …

So, the data of the first category will mostly we part of an SELECT statement, where the information of the second category is more likely to be used in WHERE and GROUP BY clauses. So this information should be handled as meta data and stored in tags to allow performant WHERE and GROUP BY clauses.

How could this be integrated to Thingworx? I see many possibilities:

Do a similar approach as done in the ThingTemplate: “Database”.
- In the config I can define “Field/Column Name Aliases” and configure a mapping of Table column names to DataShape Field Names
- We could use a similar approach to map Thing Property Names to InfluxDB Tags.
Add a flag to the “Advanced Settings” of a Thing Property to flag it as MetaData and to store it as a Tag in the InfluxDB.
Give us a special ThingShape, that handles these Tags. And let the ValueStream access them there.
Provide some special Services to set the current Tags, that the thing should use ...
.....

Please! This feature is fundamental for the InfluxDB. Otherwise we only have some time series storage, that is of no other use than … give me some values in [TS1-TS2]. That’s not what InfluxDB is meant for.

Many Greetings

Andreas

cbaldwin · ‎Dec 10, 2020

Thank you for this feedback. It was good to meet you Andreas yesterday.

I have passed along this as a sample use case to drive our future work with the InFlux team. I don't anticipate any significant changes to the InFlux until latter half of 2021 given some other items that are on our roadmap (Enterprise Kepware Configuration Management, and Streaming and Enrichment performance/scale improvements, and a new Digital Performance Management solution).

However, as we know time series support is a key for nearly all of our platform customers, I do foresee us making improvements to our use of InFlux once we burn down a few other items on our backlog.