cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
Showing results for 
Search instead for 
Did you mean: 

Community Tip - Visit the PTCooler (the community lounge) to get to know your fellow community members and check out some of Dale's Friday Humor posts! X

Is it Possible to extract the pdf content and store in a string variable using Thingworx?

KSM
14-Alexandrite
14-Alexandrite

Is it Possible to extract the pdf content and store in a string variable using Thingworx?

Hello,

I have one requirement of matching content from pdf file and from SAP system. So, I want to first extract the content in teh pdf and can able to store in variables using thingworx services.

 

if possible plzz how we can achieve that

 

Thanks

KSM

 

1 ACCEPTED SOLUTION

Accepted Solutions
VladimirRosu
19-Tanzanite
(To:KSM)

Hi @KSM , it is absolutely possible, but it is not exactly easy.

You have two options:

1. Create a ThingWorx extension that reads the PDF and gives you back the text. Apache PDFBox is one such library you can use, https://pdfbox.apache.org/download.html.

After looking on the information on the internet, creating it should be something very fast - but remember, it depends on how information is stored on the PDF, since this library won't do OCR...

2. If you have ThingWorx Flow, you can extend the Azure Computer Vision service (or use it as inspiration) to interact with the Azure Computer Vision service, and provide a PDF as input in order to perform OCR on it. Note you can use the standard connector to extract text if you give an image as an input. https://support.ptc.com/help/thingworx/platform/r9/en/index.html#page/ThingWorx/Help/Integration_Orchestration/Azure/ComputerVision.html#

View solution in original post

4 REPLIES 4
VladimirRosu
19-Tanzanite
(To:KSM)

Hi @KSM , it is absolutely possible, but it is not exactly easy.

You have two options:

1. Create a ThingWorx extension that reads the PDF and gives you back the text. Apache PDFBox is one such library you can use, https://pdfbox.apache.org/download.html.

After looking on the information on the internet, creating it should be something very fast - but remember, it depends on how information is stored on the PDF, since this library won't do OCR...

2. If you have ThingWorx Flow, you can extend the Azure Computer Vision service (or use it as inspiration) to interact with the Azure Computer Vision service, and provide a PDF as input in order to perform OCR on it. Note you can use the standard connector to extract text if you give an image as an input. https://support.ptc.com/help/thingworx/platform/r9/en/index.html#page/ThingWorx/Help/Integration_Orchestration/Azure/ComputerVision.html#

KSM
14-Alexandrite
14-Alexandrite
(To:VladimirRosu)

Hi Vladimir,

Thanks a lot for your suggestion. 

Do we have already Thingworx extension which can read the pdf and convert into the text.

 

 

Thanks

KSM

 

VladimirRosu
19-Tanzanite
(To:KSM)

Hi @KSM ,

 

I can not answer if you have a ThingWorx extension (you asked if "we have"), but I assume you wanted to know if PTC provides OOTB such an extension, and the answer is no in this case.

However, be aware PTC can build such an extension through our professional services - if you want to do this, please contact your sales person or CSM.

 

KSM
14-Alexandrite
14-Alexandrite
(To:VladimirRosu)

Thanks @VladimirRosu for your Information.

Top Tags