Community Tip - Need to share some code when posting a question or reply? Make sure to use the "Insert code sample" menu option. Learn more! X
Hi,
My customer wants to use thingworx analytics to train the model. The data volume for trailing the model(after data cleansing) would be several millions, but may has larger data over 10 millions in the future . The customer wants to train the different models or train the same model with different data monthly, bi weekly , and would be better if can train every a few days with the updated raw data.
The questions are :
1. what kinds of hardware are required for the customer's needs ? does GPU required for such case and can thingworx analytics utilize the GPU ?
2. Does thingworx analytics supports cluster deployment , not only for availability but for balancing the computing resources ?
Is there any offical document for the above questions ?
The only document I can find is https://www.ptc.com/support/-/media/CE3479EC89B74CD7891F65F2E47BE04D.pdf?sc_lang=en , but it looks just a strategy guide of cluster deployment instead of an installation guide with detailed steps, and it's for the earlier version of 8.5 , something maybe out of date .
Regards,
Sean
Solved! Go to Solution.
@seanccc ,
I have confirmed with our R&D that you can build a Distributed deployment of ThingWorx Analytics by using the Dockerfiles to build Docker images for the Analytics Components. You would deploy these on a server or a container orchestration platform (e.g. Kubernetes).
You can find the source files here: ThingWorx Analytics - Download Portal
If you need further assistance or guidance with this setup, it would be best to open a Case with Technical Support.
Please let me know if this answers your question by marking this comment as Solution Accepted.
Regards,
Neel
@seanccc
Thank you for posting to the PTC Community.
The guide you linked is actually a little outdated, but the information/concepts are still valid.
10 million record data analysis and modeling on a standalone is fine, if tuned correctly, and load balancing is not required for that amount of data. We do not have an 8.5 distributed installation package, but I can inquire on how to/if you can achieve this with 8.5 if you so desire.
ThingWorx Analytics does not utilize GPU computation at this time, everything is processed by the CPU and machine RAM allocated to the application.
Please see the help center documentation for the latest information around system requirements: https://support.ptc.com/help/thingworx_hc/thingworx_analytics_8/#page/analytics%2F8_5_1_analytics_system_requirements.html%23wwID0EXCEQ
Please let me know if you have any additional questions.
Regards,
Neel
@nsampat ,
Thank you for the reply, my customer wants to know the details of building a distributed thingworx analytics in 8.5 . please help to inquire the information how can it setup a distributed analytics properly. Thank you
Regards,
Sean
@seanccc ,
I have put out a request to our Product Management team to review this.
Due to the holidays, please forgive any delays in feedback.
Regards,
Neel
@seanccc ,
I have confirmed with our R&D that you can build a Distributed deployment of ThingWorx Analytics by using the Dockerfiles to build Docker images for the Analytics Components. You would deploy these on a server or a container orchestration platform (e.g. Kubernetes).
You can find the source files here: ThingWorx Analytics - Download Portal
If you need further assistance or guidance with this setup, it would be best to open a Case with Technical Support.
Please let me know if this answers your question by marking this comment as Solution Accepted.
Regards,
Neel
Just to add to what Neel indicated, the distributed (via dockerfile) deployment is only useful when a large amount of jobs needs to be executed at the same time.
Those jobs can then be load balanced amongst the different nodes/workers.
However from the scenario you described it seems that the customer will want to execute one (or few) potentially large job on a regular basis. This large job will be allocated to one worker only. It is not possible to split one job between several workers. So the distributed implementation will not bring anything here but add a lot of complexity and cost of ownership.
To handle this situation with one large job, you would probably be best to use the standard installation and increase the memory associated to the workers in order to process the large jobs.
Increasing the memory of the worker is described at https://www.ptc.com/en/support/article?n=CS294545.
The following articles can also be of interest:
Technical considerations for performance tuning in ThingWorx Analytics
Information about ThingWorx Analytics Installation and architecture
Hope this helps
Christophe