cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
Showing results for 
Search instead for 
Did you mean: 

Designing for Scale - Events and Data Handling

Tanzanite

Designing for Scale - Events and Data Handling

This is a lessons learned write up that I proposed to present at Liveworx but it didn't make the cut, but I did want to share it with all the developer folks.

Please note that this is before we added Influx and Micro Services, which help improve the landscape.

Oh and it's long :)

------------------------------------------

This is written as of Thingworx 8.2

 

Different ways to scale Data and Processing with Thingworx

Two main issues are targeted

  1. Data Storage
  2. Platform processing

Data Storage in Thingworx

Background

Issues around storage is that due to the limited indexing in the Persistence Provider with then the actual values according to the datashape being in a JSON Blob

So when you look in the Persistence Provider you’ll see

Source

sourceType

Location

entityID

Datetime

Tags

ValueJSONBlob

 

The first six carry an index, the JSON Blob which holds the values according to the datashape is not, that can read something like {value1:firstvalue,value2:secondvalue,value3:[ …. ]} etc.

This means that any queries beyond the standard keys – date/time, entityID (name of Stream or DataTable), source, sourcetype, tags, location become very inefficient because it will query the records and then apply the datashape query server side.

Potentially this can cause you to pull way more records over from Persistence Provider to Platform than intended.

Ie: a Query on Temperature in my data, that should return 25 records for a given month, will perhaps first return 250K records and then filter own to 25.

The second issue with storage is that all Streams are stored in one table in the Persistence Provider using entityID as an additional key to figure out which stream the record is for.

This means that your record count per table goes up much faster than you’d expect.

Ie: If I have defined 5 ValueStreams for 5 different asset types, ultimately all that data is still in one table in the Persistence Provder. So if each has 250K records, a query against the valuestream will then in actuality be a query against 1.25 million records.

I think both of these issues are well known and documented? By now and Dev is working on it.

Solution approaches

So if you are expecting to store a lot of records what can you do?

Archive

The easiest solution is to keep a limited set and archive off the rest of the data, preferably into a client’s datalake that is not part of the persistence provider, remember archiving from one stream to another stream is not a solution! Unless … you use Multiple Persistence Providers

Multiple Persistence Providers

Thingworx does support multiple persistence providers for storing data. So you can spin up extra schemas (potentially even in the same DataBase Server) to be the store for additional Persistence Providers which then are mapped to a specific Stream/ValueStream/DataTable/Blog/Wiki.

You still have to deal with the query challenge, but you now have less records per data store to query through.

Direct queries in the Persistence Provider

If you have full access to your persistence provider (NOTE: PTC Cloud Services does NOT provide this right now). You can create an additional JDBC connection to the Persistence Provider and query the stream directly, this allows you to query on the indexed records with in addition a text search through the JSON Blob all server side.

With this approach a query that took several minutes at times Platform side using QueryStreamEntries took only a few seconds. Biggest savings was the fact that you didn’t have to transfer so many records back to the Platform server.

Additional Schemas

You can create your own schema (either within the persistence provider DB – again not supported by PTC Cloud Services) in a Database Server of your choice and connect to it with JDBC/REST. (NOTE: I believe PTC Cloud Service may/might offer a standalone server with actual root access)

This does mean you have to create your own Getter/Setter services to retrieve and store information, plus you’ll need some event to store (like DataChange).

This approach right now is probably a common if not best practice recommendation if historical information is required for the solution and the record count looks to go over 1 million records and can’t just be queried based on timestamp.

Thingworx Event Processing

Background

Thingworx will consistently deal with many Things that have many Properties, and often times there will be Alerts/Rules that need to run based on value changes.

When you are using straight up Alerts based on a limit value, this isn’t such a challenge, but what if you need to add some latch/lock/debounce logic or need to check against historical values or check multiple conditions?

How can you design something that can handle evaluating these complex rules, holds some historical or derived values and avoid race conditions and be responsive?

Potential Problems

Race conditions

Multiple Events may need to update the same Permanent or Temporary store for the determination of a condition.

Duplicates

If you don’t have some ‘central’ tracker, you may possibly trigger the same rule multiple times.

Slow response

You are potentially triggering thousands or more events at the same time, depending on how you’ve set up your logic, your response could become so slow that the next event will be firing before finish and you’ll overload the system.

System queue overrun

If your events trigger faster than you can handle the events, you will slowly build up and finally overrun the event queue.

System Thread count overrun

Based on the number of cores in your system, you can overrun the number of threads that can be handled.

Connection Pool overrun

Each read/write to a stream/datatable but also Property Persist is a usage of the connection pool to your persistence provider. If you fire a lot at once, you can stack up requests and cause deadlocks

System out of memory

Potentially in handling the events you are depending on in memory information, if that is something that grows over time, you could hit an ‘Out of Memory’ issue.

Solution Approaches

Batch processing

Especially with Agents/Sources that write a set of property updates, you potentially trigger multiple threads that all may need the same source information or update the same target information.

If you are able to process this as a batch, you can take all values in account and only process this as a single event and have just a single read from source or single write to target.

This will be difficult to achieve when using something like Kepserver, unless it is transferring as something non-standard like MQTT.

But if you can have the data come in as a single REST POST this approach becomes possible.

In Memory vs. Table/Stream Storage

To speed up response time, you can put necessary information into Memory vs. in a DataTable or Stream. For example, if you need the most current received record together with some historical values, you could:

  • Use a Stream but carry the current value because the stream updates async. (ie adding the current value to the stream doesn’t guarantee that when you read from the stream it has already been committed)
  • Use a DataTable because they are synchronous but it can make the execution slow, especially if you are reaching 100K records or more
  • Use an InfoTable or JSON Property, now this information is in memory and runs the fastest and is synchronous. Note that in some speed testing JSON object was faster than InfoTable and way faster than DataTable. One challenge is that you would have to do a full overwrite if you need to persist this information. Doing a full write does open up the danger of a race condition, if this information is being updated by multiple threads at the same time. If it is ok to keep the information in memory than an InfoTable is nice because you can just add/delete rows in memory. I sadly haven’t figured out yet how to directly do this to a JSON object property :(. It is important to consider disaster recovery scenarios if you are only using this in memory

Centralized Processing vs. Distributed Processing

Think about how you can possibly execute some logic within the context of the Entity itself (logic within the ThingShape/ThingTemplate) vs. having it fire into a centralized Service (sync or async) on a separate Entity.

Scheduler or Timer

As much as Schedulers and Timers are often the culprit of too many threads at the same time, a well setup piece of logic that is triggered by a Scheduler or Timer can be the solution to avoid race conditions

If you are working with multiple timers, you may want to consider multiple schedulers which will trigger at a specific time, which means you can eliminate concurrence (several timers firing at the same time)

Think about staggering execution if necessary, by using the hated, looked down upon … but oft necessary … pause() function !!!!

Synchronous vs. Asynchronous

Asynchronous execution can give great savings on the processing speed of a thread, since it will kick off the asynch parts and continue on. The terrible draw back, you can’t tell when it is finished nor what the resulting output is.

As you mix and match synch/asynch vs processing speed, you may need to consider other ways to pick up when an asynch process finishes, some Property elsewhere that will trigger into a DataChange for example.

Interesting examples

Batch Process

With one client there was a batch process that would post several hundred results at once that all had to be evaluated. The evaluation also relied on historical information.

So with some logic these properties were processed as a batch, related to each other and also compared to information held in memory besides historically storing the information that came in.

This utilized several in memory objects and ultimately also an eval() statement to have the greatest flexibility and performance.

Mix and Match

With another client, they had a requirement to have logic to do latch/lock and escalation. This means that some information needs to be persisted, however because all the several hundred properties per asset are coming in through Kepware once a second, it also had to be very fast. The approach here was to have the DataChange place information into an in memory infotable that then was picked up by a separate latch/lock/escalation timer to move it over to the persistent side.

This allowed for the instantaneous processing of DataChange and Alerts, but also a more persistent processing of latch/lock/escalation logic.

In Conclusion

Remember that PTC created its software for specific purposes. I don’t think there ever will be a perfect magical platform that will do everything we need and want. Thingworx started out on a specific path which was very high speed data ingest and event platform with agnostic all around connectivity, that provided a very nice holistic modeling approach and a simple way to build UI/UX. Our use cases will sometimes go right past everything and at times to the final frontier aka the bleeding edge and few are a carbon copy of another. This means we need to be innovative and creative. Hopefully all of you can use the expert knowledge you have about our products to create those, but then also be proactive and please share with everyone else!