Solved: How to bulk load historical data into Thingworx

DmitryTsarev · ‎Sep 16, 2020

I'm trying to load historical data into Thingworx and tried several approaches but still can't get the desired result and/or performance

So, I have a bunch of data (~30 devices, ~40 sensors per device, ~120 000 data points per sensor, with timestamps).

I'm using ValueStreams to store data.

The first thing I tried is just to use PUT with "https://<thingworx>/Thingworx/Things/<thingname>/Properties/*". This way I can load all 40 sensors woth a single REST request, and Thingworx consumes something like 150 PUT requests / sec. Works like a charm and quite fast for a laptop VM, but I couldn't find a way to pass my datetime values into timestamp and timestamps I'm getting are as of when data was loaded. I can create a separate parameter, like 'event_time', but I'd have to write wrapper services for timestamp-based Thignworx OOTB services, like QueryPropertyHistory, It looks like a kludge / workaround / hack to me, rathert than a feasible solution, and I'd prefer to avoid doing this.

I could probably update timestamp after loading the data, but, again, couldn't find a way to do this.

So I tried using AddNumberValueStreamEntry. It has the 'timestamp' as one of the input parameters so I can put my datetime there. But I couldn't find a way to load data with this service other than by calling it for every single sensor datapoint, so to load entire dataset I'd have to make 120 000 * 40 REST requests and it's really slow.

Well, after all, it works, and I can probably live with this, but there must be a better way for sure?

There is AddStreamEntries service, but it's not available for ValueStreams, only for Streams.

I could load data directly to the DBMS of choice (I'm using Influx), but it's usually a frowned upon way and I guess not supported by PTC.

What could be done about this? Loading historical data looks like quite a common task as I see it. Are there "best practices for this? Do I overlook some approach or a service?

A bonus question - I'm not really a seasoned Thingworx guy so I wasn't sure when choosing ValueStream vs Stream for storing data. Is there a simple Yes/No answer to whether choosing ValueStream was a good decision or not, or that's a mattert for a separate post / topic?

slangley · ‎Nov 04, 2020

Hi @DmitryTsarev.

Try the UpdatePropertyValues service. This service exists on the Thing itself and accepts vtq input which will allow you to capture the date/time of the device. It accepts an infotable allowing you to pass in multiple data points for the Thing.

Please let us know if this will work for your use case.

Regards.

--Sharon

View solution in original post

slangley · ‎Nov 04, 2020

Hi @DmitryTsarev.

Try the UpdatePropertyValues service. This service exists on the Thing itself and accepts vtq input which will allow you to capture the date/time of the device. It accepts an infotable allowing you to pass in multiple data points for the Thing.

Please let us know if this will work for your use case.

Regards.

--Sharon

DmitryTsarev · ‎Nov 06, 2020

Thank you, Sharon @slangley

You're my hero for this weekend. Using UpdatePropertyValues it takes literally just a few seconds to load full history (i.e. 120 000 measurements) for a single sensor. I haven't tested thoroughly yet, but if I'm not missing anything, it would take just like 2-3 minutes for a single device (i.e. 40 sensors). Couldn't even dream about it.

VladimirRosu · ‎Nov 06, 2020

Looking at the raw number of updates you wanted to do in one go, I think you should be aware (even if I think you won't really reach that number) that you might need to do a little batch approach in case you want to load in bulk more data.
The reason is the following: the ValueProcessingSubsystem, which handles the property updates historical storage, has a fixed, but configurable maximum number of stream entries to queue. Look in the monitoring subsystem, at the ValueStreamProcessing, specifically the "Maximum number of stream entries to queue". In my system it has the value of 500.000. If I would feed the UpdatePropertyValues service with a 500.001 rows infotable, the last row will be discarded.

In practice I don't think stuff like this happens, but just in case, be aware.

DmitryTsarev · ‎Nov 06, 2020

@VladimirRosu

I was thinking about possible limitations on the amount / number of data in a single request and was actually prepared to split data into batches, but it worked out with 120 000k.

Thank you for the insight and explanation regarding the ValueStreamProcessing subsystem - I didn't know it and it's always useful and valuable to know such details on how things work.

VladimirRosu · ‎Nov 06, 2020

Looking at the bonus question, while I think that posts already tackle this idea, like this article: https://www.ptc.com/en/support/article/CS204091 the basic idea is that the stream's column definition is customizable, while the ValueStream does not have such a concept, being just aimed at storing timeseries data - so therefore having a fix number of colums which are invisible for you.
Other than this, that article will explain far better than I will do it here