cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
Showing results for 
Search instead for 
Did you mean: 

PE Question: Is 38,000 Pages Too Big?

ptc-3050150
1-Newbie

PE Question: Is 38,000 Pages Too Big?

We are about to run a 38,000 page book through the PE.

Has anyone done a book that big before?

Anyone have a clue as to hardware requirements? RAM? Pagespace settings? Have a guess as to how long it will take?

John T. Jarrett
BAE Systems | Arbortext version 5.4 | LOGSA XSL-FO v 1.5

14 REPLIES 14

Just for what it's worth.

20,000 page book in 5.1 two years ago (older server, can't tell you config) -- nearly 24 hours
same book in 5.3 one year ago, much newer server -- 6 hours
one chapter of same book, 2800 pages, but much more generated text, 5.3, 6 months ago -- 2 hours
another chapter of same book, 3000 pages, much more generated text, 5.3, 6 months ago -- would not run as a single instance, ran half at a time

Far too many variables to give you any real idea. The large instances that ran were essentially text and graphics. Little generated output, e.g. no change bars, no accumulated highlights info, etc. Chapter that would not run as a single instance could be run by removing all the 'highlights'-generating elements. All generated text being significant server load, that makes a big difference very quickly.

As I said, for what it's worth. 😕
Steve Thompson
+1(316)977-0515
When the only tool you have is a hammer,
Everything looks like a hard disk...
Alessio
15-Moonstone
(To:ptc-3050150)

Did you already rule out the option of publishing several parts as different PDF documents and then delivering them together as a Digital Media Publisher application indexed as a whole for easier search experience? This approach would enable you to publish multiple parts concurrently on a SMP machine, which would not be possible were you publishing a single PDF. How big in GB would the final PDF be? How many graphics and how big their total size? FOSI or XSL?

Not using DMP

Army/TACOM/LOGSA XSL-FO

Volume breaks every 1500 pages (25-26 volumes total)

Assuming one huge PDF we will have to split once produced

Tons of medium sized graphics (maybe 100Kb avg sized wmf's)

Assume an average of two xref to another part of the book per page (maybe much higher, just guessing)

TOCs by book and volume (indexes don't work yet - first time THAT might actually be a good thing)

Thanks! And "for what its worth" is a whole lot better than nothing.

John T. Jarrett CDT
Tech Writer II, Tech Pubs, ILS,Land & Armaments/Global Tactical Systems

T832.673.2147 ext 1147 | M 512.736.7031 | -
BAE Systems, 5000 I-10 West, Sealy, Texas USA 77474
www.baesystems.com

Internally within PTC/Arbortext R&D we have some test documents that are in the 50,000 page range and format successfully using a FOSI. These documents are pretty simple, not much gentext, no tables that extend across many pages, and so on. So if your documents are similar and if you were using a FOSI then you would have good odds but how fast would just be a guess on my part.

Given that you are using XSL-FO, our experience is that much smaller documents will usually run out of Java memory space during the transformation step of FO publishing. I suggest that you spend some time testing with smaller documents to see where your limits are going to be. We cannot give you a good idea since how soon the Java machine runs out depends so much on exactly what the FO stylesheet has in it.

John Dreystadt
Software Development Director
Arbortext - PTC
734-352-2835
-

Another data point to consider: using XSL-FO our team found that documents in excess of 2000 pages sometimes took 6 or more hours to render. This was using PE 5.2 on Solaris 9 with Adobe Distiller server, stylesheets did some generated text. Source consisted mainly of text, CALS tables, and on average a graphic every 3 pages; note however that results varied considerably from document to document, there was no clear way to predict what exactly affected composition times. Java virtual memory was an issue as John mentioned. The sluggishness prompted us to switch to FOSI.

Todd Nowlan
Project Manager, Knowledge Services Information Technology
Certified Lean Six Sigma Green Belt
Nortel
rtnowlan@nortel.com
Telephone +1 613 763 4873 / ESN 393 4873

Well, we can't get our 38,00 pager to run full through.

This is the current error message:

<data name="errorMessage" value="com.arbortext.e3.E3RequestError:" subprocess=" terminated=" unexpectedly=" while=" executing=" the=" composition=" operation.&quot;="/>
<data name="outputCode" value="500"/">

This is only slightly better than getting the "General XPath error."

In order to keep it from timing out (with Susan Fort's help - thanks!) I've set the subprocess pool as follows:

<subprocesspool default="yes" id="pool-default" maxsubprocesses="2" maxbusyinterval="0"&lt;br"/>minSubprocesses="1" maxLifetime="0">

Quad core server, 4 GB RAM, 14 GB page file (the standard 2 min/4 max GB setting was not enough). Crashes after about 4 hours.

I tried without the appendices to cut down on gentext - no change.

I'm saving the temp files and they all appear to be complete - x2.xml as well as the intermediate and the outputdir with ati tags. The x2 is 40 GB and the intermediate/ouputdir file is 450 GB <-- can't even open it with Arbortext. I was hoping they were crashing and I could tell where, but each has an ending </production> or </fo:root> tag as appropriate.

Other than rqbody.dat (which apparently I don't know how to open), I've looked at all the transaction files and find no help at all.

We just did a test on a book that is half that size (still in the huge department) and are back to the General XPath Error even though our contractor has published it in the past with 5.3 and the older LOGSA v1.4 stylesheets (which has badly formed jumbled up tables).

Any ideas on what to try next? At 4 hours to a crash, this is turning into a really long process.

John T. Jarrett
BAE Systems | Arbortext version 5.4 | LOGSA XSL-FO v 1.6

4 hour failures are not conducive to debugging ... Smiley Sad

I forget, have you told us whether you can you successfully process half? Or
quarters? Or eighths?

Do you know if the increase in processing time is linear?

Does Arbortext suggest any logging strategies? (These obviously will add
their own overhead, you want to run with the smallest possible failing job
...)

On Thu, Feb 25, 2010 at 1:37 PM, John Jarrett
<->wrote:

> Well, we can't get our 38,00 pager to run full through.
>
> This is the current error message:
>
> <data name="errorMessage" value="com.arbortext.e3.E3RequestError:&lt;br"/>> Subprocess terminated unexpectedly while executing the composition
> operation."/>
> <data name="outputCode" value="500"/">
>
> This is only slightly better than getting the "General XPath error."
>
> In order to keep it from timing out (with Susan Fort's help - thanks!)
> I've set the subprocess pool as follows:
>
> <subprocesspool default="yes" id="pool-default" maxsubprocesses="2"&lt;br"/>> maxBusyInterval="0"
> minSubprocesses="1" maxLifetime="0">
>
> Quad core server, 4 GB RAM, 14 GB page file (the standard 2 min/4 max GB
> setting was not enough). Crashes after about 4 hours.
>
> I tried without the appendices to cut down on gentext - no change.
>
> I'm saving the temp files and they all appear to be complete - x2.xml as
> well as the intermediate and the outputdir with ati tags. The x2 is 40 GB
> and the intermediate/ouputdir file is 450 GB <-- can't even open it with
> Arbortext. I was hoping they were crashing and I could tell where, but each
> has an ending </production> or </fo:root> tag as appropriate.
>
> Other than rqbody.dat (which apparently I don't know how to open), I've
> looked at all the transaction files and find no help at all.
>
> We just did a test on a book that is half that size (still in the huge
> department) and are back to the General XPath Error even though our
> contractor has published it in the past with 5.3 and the older LOGSA v1.4
> stylesheets (which has badly formed jumbled up tables).
>
> Any ideas on what to try next? At 4 hours to a crash, this is turning into
> a really long process.
>
> John T. Jarrett
> BAE Systems | *Arbortext version 5.4 | *LOGSA XSL-FO v 1.6
>

I would extend what you have started. Split the problem up and see if
you have an issue in the content vs a size problem. You might have a
nasty table or something that pushes you over some limit.

Have you logged a call with support?

..dan

Thanks there Paul and Dan. Should have thought of carving it up...guess it comes from being an "only one" in the shop and not having anyone else to bounce stuff off of to get the brain clicking again.

I cut out almost all of the individual documents (to us, the work packages that make up the meat of the full doc, some 37,000 pages) and 593 pages printed, 79 MB PDF - but there were no graphics. That took about five minutes.

I turned one chapter back on, some 900 sub-docs, and it crashed, again. Lots of graphics not found and then a non-fatal "subprocess terminated unexpectedly" combined with a graphic file not found and then after a few more missing graphics warnings, a "Java heap space" error.

That's interesting because I'm running it from the PE's own editor.exe on the same machine as the PE...and the graphics appear in editor fine.

I should note this is the same PE where Compose | PDF File doesn't show graphics but Print Composed to a PDF print driver does...so I'm using a PDF driver rather than the PE's PDF capabilities. We also are not using Distiller - I've seen that option about but is not something we've invested in.

I'll see if I can do something about the graphics and fire it up again. Thanks again - and does that trigger any more suggestions?

John T. Jarrett CDT
Tech Writer II, Tech Pubs, ILS, Land & Armaments/Global Tactical Systems

T 832.673.2147 ext 1147 | M 512.736.7031 | -<">mailto:->
BAE Systems, 5000 I-10 West, Sealy, Texas USA 77474
www.baesystems.com

Ah, and pesubprocess.log says I ran out of virtual memory space again. Haven't seen that file in the transaction zip before.

John T. Jarrett CDT
Tech Writer II, Tech Pubs, ILS, Land & Armaments/Global Tactical Systems

T 832.673.2147 ext 1147 | M 512.736.7031 | -<">mailto:->
BAE Systems, 5000 I-10 West, Sealy, Texas USA 77474
www.baesystems.com

A couple of possible next steps (although I'd definitely open a call
w/PTC-Arbortext support and get their input on the failures, log messages,
etc.):

Try running larger and larger fragments without graphics. We have had
failures (sorry, I can't remember the failure messages) where one "badly
formed" graphic failed the whole job. Typically they were larger than other
graphics from a filesize perspective. I think ultimately we found they had
accidentally been saved with an insanely high resolution.

Try working with just that chapter and chopping it down until it works. Try
to determine whether it is a size issue or whether some part of that chapter
is bad.

Try running the rest of the document BUT that chapter. (Or, turning on a
different chapter.)

How are your sub-docs included? Are they actual in-line XML content,
includes, chunks assembled by a CMS? Maybe flattening the document (in a
copy of course) would reduce some overhead that is tripping up PE if they
are somehow being assebled, especially if Editor/PE is doing that.


On Fri, Feb 26, 2010 at 5:01 AM, Jarrett, John T. (US SSA) <
-> wrote:

> Thanks there Paul and Dan. Should have thought of carving it up…guess it
> comes from being an "only one" in the shop and not having anyone else to
> bounce stuff off of to get the brain clicking again.
>
>
>
> I cut out almost all of the individual documents (to us, the work packages
> that make up the meat of the full doc, some 37,000 pages) and 593 pages
> printed, 79 MB PDF - but there were no graphics. That took about five
> minutes.
>
> I turned one chapter back on, some 900 sub-docs, and it crashed, again.
> Lots of graphics not found and then a non-fatal "subprocess terminated
> unexpectedly" combined with a graphic file not found and then after a few
> more missing graphics warnings, a "Java heap space" error.
>
> That's interesting because I'm running it from the PE's own editor.exe on
> the same machine as the PE…and the graphics appear in editor fine.
>
> I should note this is the same PE where Compose | PDF File doesn't show
> graphics but Print Composed to a PDF print driver does…so I'm using a PDF
> driver rather than the PE's PDF capabilities. We also are not using
> Distiller - I've seen that option about but is not something we've invested
> in.
>
> I'll see if I can do something about the graphics and fire it up again.
> Thanks again - and does that trigger any more suggestions?
>
> *John T. Jarrett CDT***
>
> Tech Writer II, Tech Pubs, ILS, Land & Armaments/Global Tactical Systems
>
>
>
> T 832.673.2147 ext 1147 | M 512.736.7031 | -
>
> BAE Systems, 5000 I-10 West, Sealy, Texas USA 77474
>
> www.baesystems.com
>
>
>
>
>
> *From:* Paul Nagai [
> operation."/>
> <data name="outputCode" value="500"/">
>
> This is only slightly better than getting ...
































Research the javavmmemory setting. Experiment with this setting (create a
set_javavmmemory.acl and store it in ...custom/init). I have gotten
fluctuating contradictory advice from support on this setting, but setting
or changing my setting has definitely affected some problems. As for the
varying advices, I do not know whether best practices have changed, the core
code it manipulates has changed, my hardware and/or software environment has
changed, or some combination of those.

Most of our "mystery" publication failures have been due to poorly-formed/poorly-updated art files. One thing you can do is to ask the writer/editor of the affected file if he or she has used updated graphics, updated or caused to be updated any illustrations. You can then start by concentrating on the appropriate graphic files and see if any of them are too big, have too much resolution, have graphic objects way outside of the bounding box, etc. etc.

At 11:59 AM 3/1/2010, you wrote:
>How are your sub-docs included? Are they actual in-line XML content,
>includes, chunks assembled by a CMS? Maybe flattening the document
>(in a copy of course) would reduce some overhead that is tripping up
>PE if they are somehow being assebled, especially if Editor/PE is doing that.

Not that it was working with PE, but a document with lots of includes
and the change tracking would just about kill the editor wit only a
few thousand pages. Once I flattened it, the document opened in seconds.

..dan
---------------------------------------------------------------------------
Danny Vint

Panoramic Photography
Announcements