cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
Showing results for 
Search instead for 
Did you mean: 

Community Tip - You can change your system assigned username to something more personal in your community settings. X

Process large datasets

skef
11-Garnet

Process large datasets

I need to process large datasets with thingworx analytics. By using the API, I can create datasets with 1 million - 6 million rows. After I created the datasets, the analytics builder becomes awesome slow and is throwing the following error:

 

GetDatasetConfigurationAMS: ERROR: JavaException: java.util.concurrent.TimeoutException: Timed out APIRequestMessage [requestId: 633, endpointId: -1, sessionId: -1, method: POST, entityName: localAnalytics_DataThing, characteristic: Services, target: GetDatasetSchema]

 

Which would be the maximum dataset size thingworx analytics is capable to process?

7 REPLIES 7
cmorfin
19-Tanzanite
(To:skef)

Hi Skef

 

I don't have numbers for ThingWorx Analytics Builder but indeed for large dataset the Help Center does recommend to use a direct upload to the repository. See https://support.ptc.com/help/thingworx_hc/thingworx_analytics_8/#page/thingworx_analytics_8%2Fanalytics-data-upload-large.html 

 

Hope this helps

Kind regards

Christophe

 

skef
11-Garnet
(To:cmorfin)

The problem is not storing the data with thingworx analytics, its about processing it with the analytics builder. The described error message indicates, that an error occurred. But why? Maybe the dataset is to large? Is it possible to configure API timeouts? 

cmorfin
19-Tanzanite
(To:skef)

Hi Skef

 

Apologies I thought you had the error when uploading the dataset.

So If I understand well you are able to uplaod a dataset of between 1 and 6 millions row, but you get this error when working with it.

If that is correct could you clarify when exactly do you get the error ?

What operation do you do to receive this error ?

If you repeat the same operation do you always have this error or does it work sometimes ?

How many rows and columns has your dataset ?

What datatype are they ?

 

Also regarding your deployment:

Which version of ThingWrox Analytics is it ?

Which version of OS are you using ?

Is it a native or docker deployment ?

How much RAM and processors does the server got ?

 

Thank you

Kind regards

Christophe

 

skef
11-Garnet
(To:cmorfin)

Hello,

 

thanks for your reply, I answered your questions below:

 


So If I understand well you are able to uplaod a dataset of between 1 and 6 millions row, but you get this error when working with it. 


 


Yes, correctly.

 

 

If that is correct could you clarify when exactly do you get the error ?

I get this error using the analytics builder. When I open the menu dataset TW_ML_Datasets_v1 (opening the menu Dataset) an when I open the large dataset (double click list).

 


What operation do you do to receive this error ?

 

Just opening the mashup or triggering the services by opening the mashups.

 


If you repeat the same operation do you always have this error or does it work sometimes ?


Always.

 


How many rows and columns has your dataset ?

 


5 million columns, every row got 50 fields.

 

 

What datatype are they ?

 


String, Boolean, Integer

 


Also regarding your deployment:

Which version of ThingWrox Analytics is it ?

Which version of OS are you using ?

Is it a native or docker deployment ?

 

Analytics version: 8.2

OS: Linux distribution

Thingworx: Docker, Analytics services native

 

 

How much RAM and processors does the server got ?

 

RAM: 32 GB

CPU: Intel(R) Core(TM) i7-7700T CPU @ 2.90GHz

 

Thank you very much!

 

cmorfin
19-Tanzanite
(To:skef)

Hi Skef

 

I have been making some test with a 5.2 million rows dataset and 62 columns in ThingWorx Analytics 8.2.1 but I do not get any such issue.
However this can be related to the data itself.

So in that respect is it possible for you to zip and upload the dataset json and csv file so we can test it inhouse ?

 

Also, could you please reproduce the error and upload all the following log :

- <ThingWorxStorage>/logs folder

- <Tomcat>/logs folder

- ThingWorx Analytics log, see https://www.ptc.com/en/support/article?n=CS268761  - redirect the edge log into a file: journalctl --unit twas-edge-ms  > edge.log

- /usr/local/nginx/logs folder

 

Thank you

Kind regards

Christophe

 

skef
11-Garnet
(To:cmorfin)

Thank you very much for your support, but I can't upload the data and logfile because they are classified.

 

Could you at least explain (or guess), which timeout occurred? The service GetDatasetConfigurationAMS is a helper service of the helper thing TW_ML_Helper, which tries to get the dataset schema. Is the dataset schema the meta data?

 

cmorfin
19-Tanzanite
(To:skef)

Hi Skef

 

No I can't tell where the timeout occurs.

This is why I mentioned the different log files in my previous post in the hope to isolate where the timeout potentially occurs.

You can maybe go through those log and see if some report some timeout.
The more obvious one could be Tomcat or nginx but that could come form somewhere else.

In release 8.3 ThingWorx Analytics no longer uses nginx, so it might help to test in 8.3 so we have one less component to potentially throw a timeout.

 

Hope this helps

Kind regards

Christophe

 

Top Tags