cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
Showing results for 
Search instead for 
Did you mean: 

Community Tip - You can subscribe to a forum, label or individual post and receive email notifications when someone posts a new topic or reply. Learn more! X

Translate the entire conversation x

Configuration of PDF OCR with Adobe Livecycle Server ES3

ArjenBlok
10-Marble

Configuration of PDF OCR with Adobe Livecycle Server ES3

We have scanned documents that are not searchable, so we would like to use OCR functionality to create viewables using Adobe Livecycle Server. When enabling the OCR capability we receive an error and it is unclear what the cause is. Any one having experience with configuration of OCR based on Adobe Livecycle Server (ES3)?

4 REPLIES 4

This would be very valuable to have and would make AEM/PDF Publisher a worthwhile investment. 

jw_CS
13-Aquamarine
(To:icelynnin)

If you are a bit handy(or search areound a bit) you can use Python to extract images from pdf files.
With PyTesseract (and Tesseract) you can do the OCR your self and merge the different files again to pdf.

icelynnin
12-Amethyst
(To:jw_CS)

These things are all trivial in isolation. Doing it as part of Windchill publish workflow is another issue however. 
PTC explicitly support use of AEM, which happens to have OCR out of the box. PTC does not support customisations and extensions, and provides very little documentation on how to modify the publication generation steps like that.

 

Fortunately, Chromium has Pdf-Searchify which uses Tesseract to generate in-browser OCRing of PDF images- but this doesn't result in index-able data for SOLR Server.or Windchill Search Preview.

jw_CS
13-Aquamarine
(To:icelynnin)

I understand what you meant now

Announcements

Top Tags