Community Tip - Have a PTC product question you need answered fast? Chances are someone has asked it before. Learn about the community search. X
Hello,
I have one requirement of matching content from pdf file and from SAP system. So, I want to first extract the content in teh pdf and can able to store in variables using thingworx services.
if possible plzz how we can achieve that
Thanks
KSM
Solved! Go to Solution.
Hi @KSM , it is absolutely possible, but it is not exactly easy.
You have two options:
1. Create a ThingWorx extension that reads the PDF and gives you back the text. Apache PDFBox is one such library you can use, https://pdfbox.apache.org/download.html.
After looking on the information on the internet, creating it should be something very fast - but remember, it depends on how information is stored on the PDF, since this library won't do OCR...
2. If you have ThingWorx Flow, you can extend the Azure Computer Vision service (or use it as inspiration) to interact with the Azure Computer Vision service, and provide a PDF as input in order to perform OCR on it. Note you can use the standard connector to extract text if you give an image as an input. https://support.ptc.com/help/thingworx/platform/r9/en/index.html#page/ThingWorx/Help/Integration_Orchestration/Azure/ComputerVision.html#
Hi @KSM , it is absolutely possible, but it is not exactly easy.
You have two options:
1. Create a ThingWorx extension that reads the PDF and gives you back the text. Apache PDFBox is one such library you can use, https://pdfbox.apache.org/download.html.
After looking on the information on the internet, creating it should be something very fast - but remember, it depends on how information is stored on the PDF, since this library won't do OCR...
2. If you have ThingWorx Flow, you can extend the Azure Computer Vision service (or use it as inspiration) to interact with the Azure Computer Vision service, and provide a PDF as input in order to perform OCR on it. Note you can use the standard connector to extract text if you give an image as an input. https://support.ptc.com/help/thingworx/platform/r9/en/index.html#page/ThingWorx/Help/Integration_Orchestration/Azure/ComputerVision.html#
Hi Vladimir,
Thanks a lot for your suggestion.
Do we have already Thingworx extension which can read the pdf and convert into the text.
Thanks
KSM
Hi @KSM ,
I can not answer if you have a ThingWorx extension (you asked if "we have"), but I assume you wanted to know if PTC provides OOTB such an extension, and the answer is no in this case.
However, be aware PTC can build such an extension through our professional services - if you want to do this, please contact your sales person or CSM.