Solved: Concurrent update data table causes inconsistent d...

seanccc · ‎Jul 31, 2019

Hi,

I want to know how ThingWorx keep data consistent during concurrent update. I asked a similar question days ago and got reply that if set a persistent property then the thread will be locked and will be executed one by one.

However, I tested the approach but still got inconsistent data. Any suggestions ?

I created a reproducible case:

1. Import the attached XML

2. Find the data table ConCurrentDataTable

3. Execute ResetTestData, it will call Async service _Update to simulate concurrent update.

4. Execute ConCurrentUpdate, it will update the properties of entry [Type=station1] with new value.

5. Check the table ConCurrentDataTable, you will find only a few properties get updated .

Regards,

Sean

mhollenbach · ‎Aug 28, 2019

You will most likely want to offload this data to another DB. ie. Option 3 in your post.

Having 6 million data points entered annually is going to be better served by going the Database ThingTemplate route and having a separate server hosting a DB to manage this. Performance with queries and purging will be a concern for you in the future with this much data. We also support writing custom SQL services (a dropdown in the custom Javascript service dialog) to query out to your database you create with the Database ThingTemplate.

View solution in original post

slangley · ‎Jul 31, 2019

Hi @seanccc.

In looking at your code, what do the stations represent? We're trying to understand your use case.

ThingWorx data tables are intended for static data--not data that is frequently changing. If your users are making frequent edits, data tables may not be the appropriate solution.

If we can understand better regarding what this data represents, we'll be happy to offer solutions.

Regards.

--Sharon

seanccc · ‎Jul 31, 2019

@slangley ，

The uploaded XML is just a simplified data table to demo the concurrent situation, the real case is the following :

My customer is a factory with about 50 production lines. Each production line has an average of 14 stations, each station has one device.
Each production line has 3 production shifts per day. Before each start of production, it is necessary to generate a start order. This start order includes the check items that each station needs to check before starting production. There are 8 check items for each station, which may be device parameters or manual checks. Once get data from device or the operator mark the check box of the check items, the result of the check item need to updated to Pass or Fail.

Therefore, the annual data size is as follows:

50 production lines * 14 stations * 8 inspection items * 3 shifts per day * 365 = 6 million

However, the official document shows that the data size of the Data Table is 300,000 to 500,000, so the Data Table is designed to be the start order level instead of the check item level:

Field	Data Type	Primary Key
LineNumber	STRING	Yes
ShiftNumber	STRING	Yes
StationCheckItems	JSON	No

The annual data records would be only:

50 production lines * 3 shifts per day * 365 = 54,750‬

The format of StationCheckItems is(for simplicity, only the main fields are listed.):

{
"Stations":[
{
"StationName": "",
"StationNumber": "",
"CheckItems": [
   {
   "CheckItemName": "",
   "ExpectedFrom": "",
   "ExpectedTo": "",
   "ActualValue": ""
  }
]
}
]
}

So the field StationCheckItems will be updated concurrently as it has multiple stations and check items. The easiest way is to add lock on update to mute each other , although this way affects the throughput somehow, it's not a problem in my case.

I also considered other approaches:

Creating a separated Data Table for each production Line, I'm not sure if this way can mitigate performance pressure of data table since all of the data tables are in a same PostgreSQL table actually.
Use separated persistent provider for each Line to reduce the amount of data in Data Table.
Use JDBC (Database ThingTemplate) to update the field StationCheckItems directly(something like SELECT .... FOR UPDATE)
Invent a lock method in Java service to serialize the update.

But all of these approaches are complex I think.

mhollenbach · ‎Aug 28, 2019

You will most likely want to offload this data to another DB. ie. Option 3 in your post.

Having 6 million data points entered annually is going to be better served by going the Database ThingTemplate route and having a separate server hosting a DB to manage this. Performance with queries and purging will be a concern for you in the future with this much data. We also support writing custom SQL services (a dropdown in the custom Javascript service dialog) to query out to your database you create with the Database ThingTemplate.

Concurrent update data table causes inconsistent data ?

Concurrent update data table causes inconsistent data ?