Configuration of PDF OCR with Adobe Livecycle Server ES3

Forum|Forum|11 years ago
February 11, 2015
1 reply
1642 views

We have scanned documents that are not searchable, so we would like to use OCR functionality to create viewables using Adobe Livecycle Server. When enabling the OCR capability we receive an error and it is unclear what the cause is. Any one having experience with configuration of OCR based on Adobe Livecycle Server (ES3)?

Other

I

icelynnin

12-Amethyst

This would be very valuable to have and would make AEM/PDF Publisher a worthwhile investment.

J

jw_CS

14-Alexandrite

If you are a bit handy(or search areound a bit) you can use Python to extract images from pdf files.
With PyTesseract (and Tesseract) you can do the OCR your self and merge the different files again to pdf.

I

icelynnin

12-Amethyst

These things are all trivial in isolation. Doing it as part of Windchill publish workflow is another issue however.
PTC explicitly support use of AEM, which happens to have OCR out of the box. PTC does not support customisations and extensions, and provides very little documentation on how to modify the publication generation steps like that.

Fortunately, Chromium has Pdf-Searchify which uses Tesseract to generate in-browser OCRing of PDF images- but this doesn't result in index-able data for SOLR Server.or Windchill Search Preview.

Sign up

Please use your PTC eSupport account.

Welcome to the PTC Community

Please use your PTC eSupport account.

Scanning file for viruses.

This file cannot be downloaded