cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
Showing results for 
Search instead for 
Did you mean: 

Community Tip - Need to share some code when posting a question or reply? Make sure to use the "Insert code sample" menu option. Learn more! X

Arbortext Export to MS Word, reprise

MartyRoss
1-Visitor

Arbortext Export to MS Word, reprise

Hi,


We have a business requirement to export documents from Arbortext 5.4 M020 directly into MS Word format. What is the best we can do for this?


From what I've read, export to MS Word from Arbortext is only supported via RTF, taking care to use special "Word fields", "styles", "instructions" and/or stylesheets. And, even then, the Arbortext Styler Guide& Help files suggest there are plenty of limitations and "gotchas" lurking along this path.


One of our "power" users suggests she has used elsewhere a feature / add-on to EPIC / Editor that can be used to directly export to, or compose into, a Microsoft Word (i.e., ".doc") format. Does this sound familiar to anyone? What is she talking about? She can't remember the feature, but believes it may be a separately-licensed add-on.


Any tips or leads will be carefully followed up!


Regards,


-- Marty


13 REPLIES 13

There is an Import/Export product that would do this.

I would also look into the XML capabilities of Word and see what it would
take to open the XML directly into Word. Word is supposed to support
arbitraty XML, I think the only caveat is it needs a schema to do this. I
beleive you would basically setup a set of styles in Word for the XMl you
have and do all the work within Word.

A lot would rest on how you want to format within Word - how fancy does it
need to be? How pretty in terms of page breaks and final layout. Either of
these approaches will benefit from some hand tweaking.

..dan

> Hi,
> We have a business requirement to export documents from Arbortext 5.4 M020
> directly into MS Word format. What is the best we can do for this?
> From what I've read, export to MS Word from Arbortext is only supported
> via RTF, taking care to use special "Word fields", "styles",
> "instructions" and/or stylesheets. And, even then, the Arbortext Styler
> Guide & Help files suggest there are plenty of limitations and "gotchas"
> lurking along this path.
> One of our "power" users suggests she has used elsewhere a feature /
> add-on to EPIC / Editor that can be used to directly export to, or compose
> into, a Microsoft Word (i.e., ".doc") format. Does this sound familiar to
> anyone? What is she talking about? She can't remember the feature, but
> believes it may be a separately-licensed add-on.
> Any tips or leads will be carefully followed up!
> Regards,
> -- Marty
>
>
>

We've used the Nuance OmniPage17 software to convert .pdf files that came from Arbortext into MS Word.

We wrote our own XML-to-Word conversion. It's XSLT and converts the XML in our custom schema directly to the Open XML standard for Word 2007 (.docx). Not sure I would recommend anyone going down this path, however (it was a long and difficult venture). If you can generate a PDF, there are some excellent 3rd-party tools that convert PDF to .docx, .pptx, etc.

Dave

Hi Dan & Marty--

Dan, Import/Export is the thing Marty refers to that uses RTF as an
intermediate format for exporting to Word. I also hadn't heard that Word
is now some kind of general XML editor--do you have any pointers to web
sites or other resources that describe this? I know it uses XML under
the hood for its new .docx format, but I don't know about editing for
arbitrary schemas. In that case does it enforce the schema constraints
to prevent you from making the document invalid?

Exporting to Word or round-tripping from Word is something that business
users frequently ask for, because they don't quite understand how XML
authoring is different from Word authoring. One of the reasons that it's
hard to solve this problem of getting XML data to and from Word is
because they *are* very different things. In most cases the right
solution is to try to educate the business users about the differences
(and the advantages of structured authoring, of course), so they
understand why sticking their content in Word is a Bad Idea (TM).

As a workaround, if you are just trying to get something users can open
in Word for the sake of reviewing, and approximate formatting is good
enough, you might try publishing to HTML and then importing the HTML
into Word. We used that process where I used to work and it was a
reasonable compromise for users who didn't have Arbortext and just
wanted to be able to mark up a file with review comments.

--Clay

Clay Helberg
Senior Consultant
TerraXML

> We've used the Nuance OmniPage17 software to convert .pdf files that came
> from Arbortext into MS Word.
>
>

My experience with output from tools like that is it looks good but it is
not in a form that you want to edit. I've found that they tend to make a
graphic page to try and maintain page content and line breaks (page
fidelity). By making all these text blocks though, you can't edit a
continuous document.

Ultimately it comes down to your definition of being in Word and the final
use for that format.

..dan

Speaking of which, does anyone have a map for Arbortext to Word for MIL SPEC 38784C-BV7? (never hurts to ask)

Hi Marty,



Usually I have the problem of going the other way, from word to SGML
into our custom DTD. No two word documents are ever the same. 🙂



I'd be thinking about a couple of things:



Is the intent to round trip the content as part of some editing cycle?
Losing the value of your markup is definitely an issue and indicates a
complete lack of understanding by the customer. My concern would be that
you will lose your valuable markup on export, and if the document is
edited and returned, you face any amount of non-compliant styles which
have suddenly become part of the Word document for any potential return
trip.



How much do I want to fight with this to get the output into word?
Workarounds always cost time and money in the end.



What does the output to word have to look like? I sometimes drop my
markup into a text editor and clean it up with Omnimark if people don't
want to see any markup. I can insert that content into any text editor.
I believe it is possible to create a "styled" RTF file using Omnimark,
but again you have to create that scripting yourself. I haven't tried it
myself.



Most often I find that people will prefer a PDF information product
generated from Arbortexts software. Acrobat Professional will convert
such documents to MS Word, but not nicely in all cases, and that often
forces an attempt to use OCR software, which can introduce errors into
the text. I really don't recommend going that way since you can't be
certain the content is correct without a review process, and what then
if mistakes are found?



I just recently had a conversation with PTC regarding Arbortext
Import/Export some of which I'll pass on here. It can be obtained as a
plug in for the editor. It will let you export XML documents to RTF for
use by any application that opens the RTF format. I quote the
documentation here, "Using Arbortext Import/Export, you can export XML
documents to RTF based on stylesheet definitions you create using
Arbortext Styler. Arbortext Styler must be installed and licensed to
create export stylesheets and to preview RTF documents being exported
with Arbortext Import/Export." It looked a pretty good tool to me. At
the moment I have neither but I'd like to have them. 😉



Greg


For what it's worth, we've done the same thing (using Word 2003 WordML) for several custom document types and it works well, and wasn't too awful to set up. But they're fairly simple document types, so your mileage may vary.

-James

Marty:



Arbortext Export has both advantages and limitations. The viability often
depends on who the consumer of the Word document is to be.



If the purpose is ...

1. content review, such as review by subject matter experts who do not have
(nor want) an XML Editor,

2. and if PDF review is considered unwieldy,

3. and you are using a Styler stylesheet...



then Arbortext Export can be a great solution.

Granted those are three qualifiers, but often they are all true at the same
time.



The RTF output, by and large, looks identical to Word .doc or .docx files,
and looks similar to PDF output from the same stylesheet. There is a
limitation that Word graphics which need to be scaled, must also be embedded
rather than linked, but if item # 1 (above) is true, you want probably want
embedded graphics in order to provide one RTF file (like PDF) rather than a
collection of a .doc file plus many image files (like HTML output). There
are other limitations, such as the export of tables within tables, but
that's not usually an issue.

Out of the box....

. a Styler stylesheet for your document type will export RTF that
generally looks and feels like PDF output from Styler.

. it will automatically create a Table of Contents as a TOC field,
but the TOC will have to be manually (or programmatically) generated (once)
by Word or by the person viewing the file using Word's "Update Fields"
command (CTRL-A, followed by F9)

. a single checkbox in Styler allows you to embed all graphics (that
Word will support as embedded graphics)

. a single checkbox allows you to generate the Word built-in list
styles for numbered lists and bullet lists

. Links, cross-references, footnotes, tables, index terms, and
indexes all work out of the box, although indexes have the same one-time
limitation as TOC's because INDEX fields generate the Index tables (and
there is no automatic updating of certain generated text in Word).

. Headers and footers as defined by Styler, although there are a few
limitations



If the goal is page-by-page fidelity with your Styler-generated PDF output,
RTF Export will be frustrating.



If the purpose is for RTF to be the primary composition output for your
documents, RTF Export is not the best choice.



If smarter, advanced Word documents are the goal, that is to say, Word
documents which use Word styles to do the things in a Word-like manner, you
can do that using Styler RTF-specific features. Block elements can be
mapped to built-in or user-defined Word styles. Inline elements can be
mapped to Word character styles. TOCs, Indexes, Word fields, and Headers
and Footers can be built according to Word specifications rather than
standard Styler practices. Customer-derived Word templates (.dot) files can
be embedded in, and referenced by, the RTF output.



This more advanced level of effort is more applicable if the consumer of the
Word doc is going to be making extensive changes to the RTF file, adding new
content, new headings, new lists, etc. I don't mean to imply that
round-tripping the information back to Editor is a good idea without
realizing that round-tripping is a long and winding (expensive) road, and is
sometimes an impassable road. If planned carefully within the boundaries of
Import/Export limitations, Word limitations, and the limitations of logic J,
round-tripping can be useful in a review process, but there are significant
hurdles.



So the bottom line is, if your use case fits and you already have a Styler
stylesheet, then Arbortext Export is probably a short path to nice-looking
Word documents. They cannot have the same degree of formatting complexity
as PDF output from Arbortext, because Word is, after all, merely a flat
paragraph-based word processor. There are limitations, many of which are
defined by Word's feature set.



Hopefully this helps you too, along with all the other info you have seen
today.



Mark Lambert

Oberon Technologies, Inc.




In Reply to Marty Ross:



Hi,


We have a business requirement to export documents from Arbortext 5.4 M020 directly into MS Word format. What is the best we can do for this?


From what I've read, export to MS Word from Arbortext is only supported via RTF, taking care to use special "Word fields", "styles", "instructions" and/or stylesheets. And, even then, the Arbortext Styler Guide& Help files suggest there are plenty of limitations and "gotchas" lurking along this path.


One of our "power" users suggests she has used elsewhere a feature / add-on to EPIC / Editor that can be used to directly export to, or compose into, a Microsoft Word (i.e., ".doc") format. Does this sound familiar to anyone? What is she talking about? She can't remember the feature, but believes it may be a separately-licensed add-on.


Any tips or leads will be carefully followed up!


Regards,


-- Marty





Hi Marty: Re. "from Arbortext 5.4 M020 directly into MS Word format" I wouldn't start from there. Can your requirement be re-stated as "from XML into OOXML (docx, Word 2003 and later) or "into RTF" (Word 97 and earlier)? I think couching this in app-to-app terms is problematic, as you're limited/tied to the apps themselves. XML is ideal for converting to other formats, so I would prefer to couch the problem in terms of "format-to-format", rather than "app-to-app". If you state the problem in those terms, I think it becomes more achievable - depending on your resources 🙂


We have for years converted XML to both RTF (just give the documents a .doc extension) and, more recently, to OOXML (docx). You don't mention round-tripping so I assume it's not a requirement. Just as well, as AFIAK no-one has yet managed it without unacceptable loss of data/quality. Personally I don't think it's been achievable to date, though if Word continues to converge with XML, who knows?


We use XSLT to go from XML to both RTF and OOXML. Our XML-to-OOXML process yields fully-styled Word docs with sophisticated functionality, including automated paragraph renumbering (outline numbering linked to styles) and dynamic cross-references. These documents support automated re-styling by end users (easy conversion to House Style), which is another of our requirements. The paragraph numbering (e.g. 3.4.5.6) that the editors see inthe Arbortext screen FOSI matches that in output the Word documents. I don't know of any Word feature that cannot be supported in XML, and which cannot be achieved via XML-to-OOXML conversion, though I haven't "gone looking for trouble": I can only say that we're able to single-source our content in XML and support Web, Hard Copy, Word and PDF.


You don't mention what DTD/schema you're converting from. I'm not very well up on this, but aren't there ready-made XSLT scripts that will convert from, for example, DocBook XML to RTF/OOXML? I know there are free DocBook to HTML and DocBook to PDF tools, so I would hav expected DocBook-to-RTF to be out there too. Maybe it's not, since RTF is proprietary (and PDF isn't???)


Hope this helps.

Fellow Adepter James Sulak inspired me to tackle the XSLT route to the Word 2003 XML format ("WordML"). With knowledge of WordML structure it's doable. Again, a simple doctype helps.


See http://rep.oio.dk/Microsoft.com/officeschemas/wordprocessingml_article.htm


The Word 2003 XML format is virtually the same as the fully-realized Office Open XML introduced with Word 2007, without the compressed container folder structure (so you can transform to a single flat XML file), and is seamlessly converted to MS Word 2007/2010 with the built-in converter.


The main problems with the Arbortext ImportExport RTF export route are disappointing speed and expensive licensing. That said, we absolutely rely on ImportExport for conversion to XML of Word legacy and outside submissions. Only OmniMark is comparable as far as I'm aware.


Again, round-tripping is fraught.


Years ago I wasted a couple days looking at Word as an XML editor. It can do "user XML" as an overlay to the WordML but there's no real-time validation, poor performance, and no route back to Word from user XML.


- Lou Argyres
Continuing Education of the Bar - California
2100 Franklin St, Suite 500
Oakland, CA 94612
Lou.Argyres@ceb.ucla.edu

"Word is a Bad Idea (TM)"

I'll have to remember that one. Word 2.0 was pretty good for anything up to about 50 pages...and that was the last time I chose to use Word professionally (personally anyway...still get stuck using it today depending on the client *sigh*

Some of our operator manuals are in Word here *big sigh* (but they won't be any more - I set up stylesheets so they, too, can use the same XML) and we save to HTML, open in a browser, then cut and paste into a clean Word doc with a template with the styles we want to use. From there, it is just applying the styles one paragraph at a time.

Of course, by Word 2007, this is a bloody nightmare in and of itself...why in the world does Word apply normal.dot to everything now? Oh but to have Word 2.0 back...

John T. Jarrett CDT
Senior Tech Writer, Integrated Logistics Support,Land & Armaments/Global Tactical Systems

T832.673.2147 | M 832.363.7234 | F 832.673.2376| x1147 | -
BAE Systems, 5000 I-10 West, Sealy, Texas USA 77474
www.baesystems.com

Not only "Word is a Bad idea (TM)" but as Suzanne Napoleon says in her email sig, "WYSIWYG is last century technology".
Announcements

Top Tags