Community Tip - You can Bookmark boards, posts or articles that you'd like to access again easily! X
Hi,
I try to create a metadata file but my data contains double or null value.
When i try to create a dataset Analytics show me this error:
Dataset Create Job with jobId: 56537633-72c0-46eb-ad6a-9e3206c2f659 failed with error:
java.lang.NumberFormatException: empty String
I think that in the file i don't accept the value null but i don't know how do.
Some advice ?
below a part of my metadata file.
{
"fieldName": "L0_S2_F36",
"values": null,
"range": null,
"dataType": "DOUBLE",
"opType": "CONTINUOUS"
},
Solved! Go to Solution.
You will need to work on the dataset itself to remove the missing values, there is nothing in the metadata file that will help for this.
Dealing with missing data is part of the data preparation step that is required for any machine learning activity.
This is the part that very often will take the longest however it is very important because the quality of the model directly depends on the quality of the data. If you input inaccurate or poor data, your predictive model will be poor, so it is worth spending time in getting a clean dataset.
Unfortunately this is not always easy and it can be time consuming, for that reason I can't really offer you a complete solution here.
You can though find a lot of resources on the Internet searching for Machine learning data preparation or data preparation missing values.
Posts like https://www.analyticsindiamag.com/5-ways-handle-missing-values-machine-learning-datasets/ or https://www.altexsoft.com/blog/datascience/preparing-your-dataset-for-machine-learning-8-basic-techniques-that-make-your-data-better/ are very common and give good information on how to handle this.
You can also contact PTC Professional Services who can help with the operation, though there will likely be a charge for it.
Hope this helps
Kind rgeards
Christophe
The extract of your json looks ok.
the values field is there to specified the allowed values, if any value is acceptabel then null is ok.
range is to be specified if only a range of value is allowed, again if not applicable, null is ok.
The error you get is more related to the csv.
You possibly have an empty line at the end of the csv, see https://www.ptc.com/en/support/article?n=CS295513
Hope this helps
Christophe
Hi,
I don't have a empty line
this is a line of my data file
63,,,,,,,,0.01....
the first is the id and the others are double values.
How can you see a lot of them is empty, i think that the error represents the field double without a numerical value.
or is it not true?
Hi
Yes indeed if the field is defined as DOUBLE it should have a double value in it, not null values.
You will need to prepare the data so those null value are either remove or replace.
See https://community.ptc.com/t5/IoT-Tech-Tips/Best-Practices-in-Data-Preparation-for-ThingWorx-Analytics/m-p/535271 for some information on preparing the data
Kind regards
Christophe
Hi
The data file is very big, 2 giga and i have in all of the line empty field.
can you give me some advice to solve the problem of substitution of empty field or use another type in metadata file ?
thanks.
You will need to work on the dataset itself to remove the missing values, there is nothing in the metadata file that will help for this.
Dealing with missing data is part of the data preparation step that is required for any machine learning activity.
This is the part that very often will take the longest however it is very important because the quality of the model directly depends on the quality of the data. If you input inaccurate or poor data, your predictive model will be poor, so it is worth spending time in getting a clean dataset.
Unfortunately this is not always easy and it can be time consuming, for that reason I can't really offer you a complete solution here.
You can though find a lot of resources on the Internet searching for Machine learning data preparation or data preparation missing values.
Posts like https://www.analyticsindiamag.com/5-ways-handle-missing-values-machine-learning-datasets/ or https://www.altexsoft.com/blog/datascience/preparing-your-dataset-for-machine-learning-8-basic-techniques-that-make-your-data-better/ are very common and give good information on how to handle this.
You can also contact PTC Professional Services who can help with the operation, though there will likely be a charge for it.
Hope this helps
Kind rgeards
Christophe
Thanks for all.