Skip to main content
1-Visitor
February 12, 2010
Question

PE Question: Is 38,000 Pages Too Big?

  • February 12, 2010
  • 14 replies
  • 2558 views

We are about to run a 38,000 page book through the PE.

Has anyone done a book that big before?

Anyone have a clue as to hardware requirements? RAM? Pagespace settings? Have a guess as to how long it will take?

John T. Jarrett
BAE Systems | Arbortext version 5.4 | LOGSA XSL-FO v 1.5

    14 replies

    1-Visitor
    February 12, 2010
    Just for what it's worth.

    20,000 page book in 5.1 two years ago (older server, can't tell you config) -- nearly 24 hours
    same book in 5.3 one year ago, much newer server -- 6 hours
    one chapter of same book, 2800 pages, but much more generated text, 5.3, 6 months ago -- 2 hours
    another chapter of same book, 3000 pages, much more generated text, 5.3, 6 months ago -- would not run as a single instance, ran half at a time

    Far too many variables to give you any real idea. The large instances that ran were essentially text and graphics. Little generated output, e.g. no change bars, no accumulated highlights info, etc. Chapter that would not run as a single instance could be run by removing all the 'highlights'-generating elements. All generated text being significant server load, that makes a big difference very quickly.

    As I said, for what it's worth. 😕
    Steve Thompson
    +1(316)977-0515
    When the only tool you have is a hammer,
    Everything looks like a hard disk...
    15-Moonstone
    February 12, 2010
    Did you already rule out the option of publishing several parts as different PDF documents and then delivering them together as a Digital Media Publisher application indexed as a whole for easier search experience? This approach would enable you to publish multiple parts concurrently on a SMP machine, which would not be possible were you publishing a single PDF. How big in GB would the final PDF be? How many graphics and how big their total size? FOSI or XSL?
    1-Visitor
    February 12, 2010
    Not using DMP

    Army/TACOM/LOGSA XSL-FO

    Volume breaks every 1500 pages (25-26 volumes total)

    Assuming one huge PDF we will have to split once produced

    Tons of medium sized graphics (maybe 100Kb avg sized wmf's)

    Assume an average of two xref to another part of the book per page (maybe much higher, just guessing)

    TOCs by book and volume (indexes don't work yet - first time THAT might actually be a good thing)

    Thanks! And "for what its worth" is a whole lot better than nothing.

    John T. Jarrett CDT
    Tech Writer II, Tech Pubs, ILS,Land & Armaments/Global Tactical Systems

    T832.673.2147 ext 1147 | M 512.736.7031 | -
    BAE Systems, 5000 I-10 West, Sealy, Texas USA 77474
    www.baesystems.com

    February 12, 2010
    Internally within PTC/Arbortext R&D we have some test documents that are in the 50,000 page range and format successfully using a FOSI. These documents are pretty simple, not much gentext, no tables that extend across many pages, and so on. So if your documents are similar and if you were using a FOSI then you would have good odds but how fast would just be a guess on my part.

    Given that you are using XSL-FO, our experience is that much smaller documents will usually run out of Java memory space during the transformation step of FO publishing. I suggest that you spend some time testing with smaller documents to see where your limits are going to be. We cannot give you a good idea since how soon the Java machine runs out depends so much on exactly what the FO stylesheet has in it.

    John Dreystadt
    Software Development Director
    Arbortext - PTC
    734-352-2835
    -
    1-Visitor
    February 16, 2010

    Another data point to consider: using XSL-FO our team found that documents in excess of 2000 pages sometimes took 6 or more hours to render. This was using PE 5.2 on Solaris 9 with Adobe Distiller server, stylesheets did some generated text. Source consisted mainly of text, CALS tables, and on average a graphic every 3 pages; note however that results varied considerably from document to document, there was no clear way to predict what exactly affected composition times. Java virtual memory was an issue as John mentioned. The sluggishness prompted us to switch to FOSI.

    Todd Nowlan
    Project Manager, Knowledge Services Information Technology
    Certified Lean Six Sigma Green Belt
    Nortel
    rtnowlan@nortel.com
    Telephone +1 613 763 4873 / ESN 393 4873

    1-Visitor
    February 25, 2010

    Well, we can't get our 38,00 pager to run full through.

    This is the current error message:

    <data name="errorMessage" value="com.arbortext.e3.E3RequestError:" subprocess=" terminated=" unexpectedly=" while=" executing=" the=" composition=" operation.&quot;="/>
    <data name="outputCode" value="500"/">

    This is only slightly better than getting the "General XPath error."

    In order to keep it from timing out (with Susan Fort's help - thanks!) I've set the subprocess pool as follows:

    <subprocesspool default="yes" id="pool-default" maxsubprocesses="2" maxbusyinterval="0"&lt;br"/>minSubprocesses="1" maxLifetime="0">

    Quad core server, 4 GB RAM, 14 GB page file (the standard 2 min/4 max GB setting was not enough). Crashes after about 4 hours.

    I tried without the appendices to cut down on gentext - no change.

    I'm saving the temp files and they all appear to be complete - x2.xml as well as the intermediate and the outputdir with ati tags. The x2 is 40 GB and the intermediate/ouputdir file is 450 GB <-- can't even open it with Arbortext. I was hoping they were crashing and I could tell where, but each has an ending </production> or </fo:root> tag as appropriate.

    Other than rqbody.dat (which apparently I don't know how to open), I've looked at all the transaction files and find no help at all.

    We just did a test on a book that is half that size (still in the huge department) and are back to the General XPath Error even though our contractor has published it in the past with 5.3 and the older LOGSA v1.4 stylesheets (which has badly formed jumbled up tables).

    Any ideas on what to try next? At 4 hours to a crash, this is turning into a really long process.

    John T. Jarrett
    BAE Systems | Arbortext version 5.4 | LOGSA XSL-FO v 1.6

    1-Visitor
    February 25, 2010
    4 hour failures are not conducive to debugging ... Smiley Sad

    I forget, have you told us whether you can you successfully process half? Or
    quarters? Or eighths?

    Do you know if the increase in processing time is linear?

    Does Arbortext suggest any logging strategies? (These obviously will add
    their own overhead, you want to run with the smallest possible failing job
    ...)

    On Thu, Feb 25, 2010 at 1:37 PM, John Jarrett
    <->wrote:

    > Well, we can't get our 38,00 pager to run full through.
    >
    > This is the current error message:
    >
    > <data name="errorMessage" value="com.arbortext.e3.E3RequestError:&lt;br"/>> Subprocess terminated unexpectedly while executing the composition
    > operation."/>
    > <data name="outputCode" value="500"/">
    >
    > This is only slightly better than getting the "General XPath error."
    >
    > In order to keep it from timing out (with Susan Fort's help - thanks!)
    > I've set the subprocess pool as follows:
    >
    > <subprocesspool default="yes" id="pool-default" maxsubprocesses="2"&lt;br"/>> maxBusyInterval="0"
    > minSubprocesses="1" maxLifetime="0">
    >
    > Quad core server, 4 GB RAM, 14 GB page file (the standard 2 min/4 max GB
    > setting was not enough). Crashes after about 4 hours.
    >
    > I tried without the appendices to cut down on gentext - no change.
    >
    > I'm saving the temp files and they all appear to be complete - x2.xml as
    > well as the intermediate and the outputdir with ati tags. The x2 is 40 GB
    > and the intermediate/ouputdir file is 450 GB <-- can't even open it with
    > Arbortext. I was hoping they were crashing and I could tell where, but each
    > has an ending </production> or </fo:root> tag as appropriate.
    >
    > Other than rqbody.dat (which apparently I don't know how to open), I've
    > looked at all the transaction files and find no help at all.
    >
    > We just did a test on a book that is half that size (still in the huge
    > department) and are back to the General XPath Error even though our
    > contractor has published it in the past with 5.3 and the older LOGSA v1.4
    > stylesheets (which has badly formed jumbled up tables).
    >
    > Any ideas on what to try next? At 4 hours to a crash, this is turning into
    > a really long process.
    >
    > John T. Jarrett
    > BAE Systems | *Arbortext version 5.4 | *LOGSA XSL-FO v 1.6
    >
    1-Visitor
    February 26, 2010
    I would extend what you have started. Split the problem up and see if
    you have an issue in the content vs a size problem. You might have a
    nasty table or something that pushes you over some limit.

    Have you logged a call with support?

    ..dan

    1-Visitor
    February 26, 2010
    Thanks there Paul and Dan. Should have thought of carving it up...guess it comes from being an "only one" in the shop and not having anyone else to bounce stuff off of to get the brain clicking again.

    I cut out almost all of the individual documents (to us, the work packages that make up the meat of the full doc, some 37,000 pages) and 593 pages printed, 79 MB PDF - but there were no graphics. That took about five minutes.

    I turned one chapter back on, some 900 sub-docs, and it crashed, again. Lots of graphics not found and then a non-fatal "subprocess terminated unexpectedly" combined with a graphic file not found and then after a few more missing graphics warnings, a "Java heap space" error.

    That's interesting because I'm running it from the PE's own editor.exe on the same machine as the PE...and the graphics appear in editor fine.

    I should note this is the same PE where Compose | PDF File doesn't show graphics but Print Composed to a PDF print driver does...so I'm using a PDF driver rather than the PE's PDF capabilities. We also are not using Distiller - I've seen that option about but is not something we've invested in.

    I'll see if I can do something about the graphics and fire it up again. Thanks again - and does that trigger any more suggestions?

    John T. Jarrett CDT
    Tech Writer II, Tech Pubs, ILS, Land & Armaments/Global Tactical Systems

    T 832.673.2147 ext 1147 | M 512.736.7031 | -<">mailto:->
    BAE Systems, 5000 I-10 West, Sealy, Texas USA 77474
    www.baesystems.com
    1-Visitor
    February 26, 2010
    Ah, and pesubprocess.log says I ran out of virtual memory space again. Haven't seen that file in the transaction zip before.

    John T. Jarrett CDT
    Tech Writer II, Tech Pubs, ILS, Land & Armaments/Global Tactical Systems

    T 832.673.2147 ext 1147 | M 512.736.7031 | -<">mailto:->
    BAE Systems, 5000 I-10 West, Sealy, Texas USA 77474
    www.baesystems.com