Skip to main content
11-Garnet
July 10, 2023
Solved

ThingWorx data hygiene

  • July 10, 2023
  • 1 reply
  • 2734 views

Is there any ThingWorx built-in API or services for data hygiene?

 

Best answer by VladimirRosu_116627

Clear. As stated before, before adding training data into ThingWorx Analytics, we usually use the external tools we're familiar with (Jupyter Notebook etc).

You can use the many many snippets available in ThingWorx services (see below)  to perform this type of cleaning, but they are not at the level of speed that tools like that offer. That does not mean they don't work - just that people are far more familiar in ML land with using stuff like Jupyter,

Snippets available in any service editor:

VladimirRosu_0-1689242061915.png

 

One last thing: typically what we see is the ETL process that takes most time, for example, what I saw is it takes 60-80% of the total time spent on Analytics.

Using really ThingWorx Analytics is usually a very easy process which takes much much less (just load the training dataset, set your goal and let the system crunch, rinse and repeat whenever needed).

Don't be shy on not using external ETL tools to process/clean your data. As I said, ThingWorx Analytics itself is not intended on being a replacement for an ETL tool.

1 reply

22-Sapphire I
July 11, 2023

Could you define what you mean with 'data hygiene' please. Thanks!

5-Regular Member
July 11, 2023

Hi, I'm working with Nelson who originally posed the question about data hygiene.

 

Commonly, raw data is data containing errors, incomplete, duplicated or incorrect. Having a data hygiene process is common in machine learning to clean the data. Data hygiene can ensure handling errors, standardization, normalization, missing data and duplicate data, It's also important to suppress data that doesn't provide value. Typically with Python, there are library such as pandas and NumPy that helps with dealing with missing values (NaN), removing whitespaces, checking unique values of columns are just a few basic techniques to clean data. 

 

In ThingWorx documentation, it does describe how to handle missing data. What other data hygiene features does ThingWorx have?

22-Sapphire I
July 11, 2023

Similar to what you mention in regards to what you do with Python, you would do similar things within ThingWorx or even at the Edge before transmitting data. Using Services to detect those issues and resolve them before sending the data on to Analytics or something else.

You can use the 'InfoTable' Services and JavaScript to do this.