Hi @KSM , it is absolutely possible, but it is not exactly easy.
You have two options:
1. Create a ThingWorx extension that reads the PDF and gives you back the text. Apache PDFBox is one such library you can use, https://pdfbox.apache.org/download.html.
After looking on the information on the internet, creating it should be something very fast - but remember, it depends on how information is stored on the PDF, since this library won't do OCR...
2. If you have ThingWorx Flow, you can extend the Azure Computer Vision service (or use it as inspiration) to interact with the Azure Computer Vision service, and provide a PDF as input in order to perform OCR on it. Note you can use the standard connector to extract text if you give an image as an input. https://support.ptc.com/help/thingworx/platform/r9/en/index.html#page/ThingWorx/Help/Integration_Orchestration/Azure/ComputerVision.html#