Community Tip - You can subscribe to a forum, label or individual post and receive email notifications when someone posts a new topic or reply. Learn more! X
We have scanned documents that are not searchable, so we would like to use OCR functionality to create viewables using Adobe Livecycle Server. When enabling the OCR capability we receive an error and it is unclear what the cause is. Any one having experience with configuration of OCR based on Adobe Livecycle Server (ES3)?
This would be very valuable to have and would make AEM/PDF Publisher a worthwhile investment.
If you are a bit handy(or search areound a bit) you can use Python to extract images from pdf files.
With PyTesseract (and Tesseract) you can do the OCR your self and merge the different files again to pdf.
These things are all trivial in isolation. Doing it as part of Windchill publish workflow is another issue however.
PTC explicitly support use of AEM, which happens to have OCR out of the box. PTC does not support customisations and extensions, and provides very little documentation on how to modify the publication generation steps like that.
Fortunately, Chromium has Pdf-Searchify which uses Tesseract to generate in-browser OCRing of PDF images- but this doesn't result in index-able data for SOLR Server.or Windchill Search Preview.
I understand what you meant now