Wondering if anyone can help. We are just starting our annual print project. It's a reference book of approx. 3500 pages, however we break it down into alpha letters. So, each chunk varies between 100 to 300 pages (some 400pgs). Last year, with PE 7.0 it took approx. 15-20 min to generate approx. 150 pages. This year, it's taking almost an hour (we upgraded to 7.1, not sure if that has anything to do with it) At this rate, we'll never finish on schedule.
Every time we make a correction (pagination stage) we have to regenerate the PDF. An hour each time... yikes. Note that we comment out some of the topics as we go along to help with performance, but by the time you get to the end of the letter, you are generating 100+ pages.
The PE engine is installed on it's own server on a HOST (ESX VMware).
24 GB Memory, 2 CPUs, 1 core/socket, 2 total Cores, 1 NICs, 275.58 GB Used Space, 374.16 GB Provisioned Space.
In doing some research looks like we have one sub-process, the default.
We have some acl's creating an RDS document. But before the RDS is created with run some xsl's to resolve some of the xrefs, then the RDS is created via our acl functions. Note that the acls haven't changed much since last year.
Any insights on how we could improve this process, performance?
Any advice is greatly appreciated. Thank you.
Firstly, the reason that upgrading CPU core count and memory size doesn't help is that those are not the bottlenecks. The bottleneck is that the process of creating a PDF is single-threaded, that means it will run on a single CPU core only. This is because you cannot create a PDF page without first creating the page before it, to see where the text ended. So each PDF page is assembled sequentially.
Without making any software changes, the only thing you can do in hardware to speed things up is to use a faster-clocked CPU on a more modern CPU architecture. We have noticed in cloud computing and even in modern data centres the trend is towards CPU core count as opposed to CPU clock speed, so you may struggle to find a faster CPU offering. Moving from VM back to a physical server with fastest CPU available is the best hardware option for performance. (We have had instances of our developer's laptops publishing PDF much faster than client's massively expensive data centre servers due to this difference)
The more technically "proper" fix is to look more closely at the software. On the software side you would need to do some profiling of the publishing process to gather metrics and identify where the slowdowns are occurring. We have seen instances where a single poorly-written XPath query has slowed systems to a grind. You may be able to make tweaks or adjustments in Styler or ACL code to improve performance.
Finally, for some clients with very demanding needs, we have moved them from Styler publishing to pure APP publishing. This can result in up to 10x performance increase, as we craft the publishing code to precisely suit the input data and PDF format.
This type of debugging, updating, and tuning does get quite involved, if you need help there are commercial organisations that specialise in PE/APP publishing work. This is not a sales pitch but our business, GPSL, is perhaps the most skilled in this area. Visit www.gpsl.co if you need more information.
It looks like we'll need to look at software because I ran many different tests with no success. Our hardware, even though they are VM are fairly new. The CPU's are 2.40 GHz per processor (4vCPU * 2.40GHz). We tested the performance while generating a 100pg PDF and it was ok. Currently, our stylesheet is using the APP engine. Is that different than Styler publishing? I did run same test with PE & another with Styler and with styler it was 50% faster. But still takes 30 min for 100 pages (instead of 55min).
We'll have a close look at the stylesheet to see if we can find what's slowing it down.
PE and Styler run the same publishing code under the covers. The Styler stylesheets are basically a generic format that are "compiled" down to an executable form such as APP, FOSI, XSLT (for HTML), etc. For PDF output, only the APP engine is supported, FOSI remains available but is not supported. If you have no source edits you can switch between APP and FOSI but expect the output PDF to look different.
The 2.4GHz CPUs are pretty standard for data centre processing, they make perfect sense as a balance between power consumption, thermal profile, performance. To really boost things you'd want a CPU with a base clock above 3GHz, and on latest architecture. Essentially look for the fastest available from those Xeon processors marked as "14nm" in the following list: https://en.wikipedia.org/wiki/List_of_Intel_Xeon_microprocessors
In our experience it is pretty rare to find fast CPUs in data centres due to the higher power consumption and high thermal profile.
I forgot to update on this issue. With our testing, we found that the performance in 7.0 was much faster than in 7.1. PTC was able to replicate the issue and are working on a solution.
Regarding FOSI, for the record this problem would not occur with a FOSI stylesheet (with or without Styler). Documents with thousands of pages can be formatted quickly in their entirety with FOSI.
For example, my book Practical FOSI (for sale at fosiexpert.com/Practical-FOSI.html) has >900 pages, including hundreds of tables, graphics, TOCs, and cross-references plus a three-level index. It formats (using Print Composer on a laptop) in ~3 minutes, and PDF creation takes about the same amount of time.
If desired, a FOSI stylesheet can be coded with a formatting pass reduction feature that increases the already fast formatting speed. (Formatting pass reduction is described in Arbortext Help and in my book.)
Otherwise, optimization for speed is not an issue when developing a FOSI stylesheet. Optimization is handled by the Arbortext software.