cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
Showing results for 
Search instead for 
Did you mean: 

Community Tip - Your Friends List is a way to easily have access to the community members that you interact with the most! X

write metadata properties to accept double or null value to create dataset

alessandro96
6-Contributor

write metadata properties to accept double or null value to create dataset

Hi,

I try to create a metadata file but my data contains double or null value.

When i try to create a dataset Analytics show me this error:

Dataset Create Job with jobId: 56537633-72c0-46eb-ad6a-9e3206c2f659 failed with error:
java.lang.NumberFormatException: empty String

I think that in the file i don't accept the value null but i don't know how do.

Some advice ?

below a part of my metadata file.

{
"fieldName": "L0_S2_F36",
"values": null,
"range": null,
"dataType": "DOUBLE",
"opType": "CONTINUOUS"
},

1 ACCEPTED SOLUTION

Accepted Solutions
cmorfin
19-Tanzanite
(To:alessandro96)

Hi @alessandro96 

 

You will need to work on the dataset itself to remove the missing values, there is nothing in the metadata file that will help for this.

Dealing with missing data is part of the data preparation step that is required for any machine learning activity.
This is the part that very often will take the longest however it is very important because the quality of the model directly depends on the quality of the data. If you input inaccurate or poor data, your predictive model will be poor, so it is worth spending time in getting a clean dataset.

Unfortunately this is not always easy and it can be time consuming, for that reason I can't really offer you a complete solution here.
You can though find a lot of resources on the Internet searching for Machine learning data preparation or data preparation missing values.

Posts like https://www.analyticsindiamag.com/5-ways-handle-missing-values-machine-learning-datasets/ or https://www.altexsoft.com/blog/datascience/preparing-your-dataset-for-machine-learning-8-basic-techniques-that-make-your-data-better/ are very common and give good information on how to handle this.

 

You can also contact PTC Professional Services who can help with the operation, though there will likely be a charge for it.

 

Hope this helps

Kind rgeards

Christophe

 

View solution in original post

6 REPLIES 6
cmorfin
19-Tanzanite
(To:alessandro96)

Hi @alessandro96 

 

The extract of your json looks ok.

the values field is there to specified the allowed values, if any value is acceptabel then null is ok.

range is to be specified if only a range of value is allowed, again if not applicable, null is ok.

 

The error you get is more related to the csv.

You possibly have an empty line at the end of the csv, see https://www.ptc.com/en/support/article?n=CS295513

 

Hope this helps

Christophe

 

 

alessandro96
6-Contributor
(To:cmorfin)

Hi,

I don't have a empty line

this is a line of my data file

63,,,,,,,,0.01....

the first is the id and the others are double values.

How can you see a lot of them is empty, i think that the error represents the field double without a numerical value. 

or is it not true?

 

cmorfin
19-Tanzanite
(To:alessandro96)

Hi

 

Yes indeed if the field is defined as DOUBLE it should have a double value in it, not null values.

You will need to prepare the data so those null value are either remove or replace.

See https://community.ptc.com/t5/IoT-Tech-Tips/Best-Practices-in-Data-Preparation-for-ThingWorx-Analytics/m-p/535271 for some information on preparing the data

 

Kind regards

Christophe

 

alessandro96
6-Contributor
(To:cmorfin)

Hi

The data file is very big, 2 giga and i have in all of the line empty field.

can you give me some advice to solve the problem of substitution of empty field or use another type in metadata file ?

thanks.

cmorfin
19-Tanzanite
(To:alessandro96)

Hi @alessandro96 

 

You will need to work on the dataset itself to remove the missing values, there is nothing in the metadata file that will help for this.

Dealing with missing data is part of the data preparation step that is required for any machine learning activity.
This is the part that very often will take the longest however it is very important because the quality of the model directly depends on the quality of the data. If you input inaccurate or poor data, your predictive model will be poor, so it is worth spending time in getting a clean dataset.

Unfortunately this is not always easy and it can be time consuming, for that reason I can't really offer you a complete solution here.
You can though find a lot of resources on the Internet searching for Machine learning data preparation or data preparation missing values.

Posts like https://www.analyticsindiamag.com/5-ways-handle-missing-values-machine-learning-datasets/ or https://www.altexsoft.com/blog/datascience/preparing-your-dataset-for-machine-learning-8-basic-techniques-that-make-your-data-better/ are very common and give good information on how to handle this.

 

You can also contact PTC Professional Services who can help with the operation, though there will likely be a charge for it.

 

Hope this helps

Kind rgeards

Christophe

 

alessandro96
6-Contributor
(To:cmorfin)

Thanks for all.

Top Tags