Skip to main content
1-Visitor
January 24, 2012
Question

Question about PDF search indexes

  • January 24, 2012
  • 1 reply
  • 594 views
Adepters:

Some of our deliverables are in the form of PDF collections (dozens of PDFs that can be launched from a Web page). We generate these using PE 5.3, direct-to-PDF. The group that maintains the PDF collection has a manual process for generating a PDF search index of the entire collection using the standard Acrobat tool. This, of course, is a manual and somewhat arduous task. So, I'm being asked if there's any way to automate this process (building the Acrobat search index). I'm pretty sure PE can't help - at least I couldn't find any mention of that in the PE Programmer's Guide, so I'm wondering if any of you have created these indexes and perhaps found a better method (automating the Acrobat tool, third party software, etc.)?

Thanks in advance,
Dave

    1 reply

    18-Opal
    January 24, 2012
    Hi David--



    DMP might be of use here. If you build a DMP project that includes all
    your PDFs, it will build a full-text search index as part of the DMC
    package it generates. You can search within DMC viewer of course (or in
    the web app if you produce a webapp.jar file). But the index it
    generates is a standard Lucene index, so you may be able to extract the
    index part of the DMC build and integrate it with your own search
    interface, assuming that interface also uses a Lucene-compatible index.



    The PDF's you index this way do not need to be published using PE or
    Arbortext (though of course they can be).



    --Clay





    Clay Helberg

    Senior Consultant

    TerraXML