Skip to main content
1-Visitor
March 29, 2011
Question

Arbortext Export to MS Word, reprise

  • March 29, 2011
  • 13 replies
  • 4171 views

Hi,


We have a business requirement to export documents from Arbortext 5.4 M020 directly into MS Word format. What is the best we can do for this?


From what I've read, export to MS Word from Arbortext is only supported via RTF, taking care to use special "Word fields", "styles", "instructions" and/or stylesheets. And, even then, the Arbortext Styler Guide& Help files suggest there are plenty of limitations and "gotchas" lurking along this path.


One of our "power" users suggests she has used elsewhere a feature / add-on to EPIC / Editor that can be used to directly export to, or compose into, a Microsoft Word (i.e., ".doc") format. Does this sound familiar to anyone? What is she talking about? She can't remember the feature, but believes it may be a separately-licensed add-on.


Any tips or leads will be carefully followed up!


Regards,


-- Marty


    13 replies

    1-Visitor
    March 29, 2011
    There is an Import/Export product that would do this.

    I would also look into the XML capabilities of Word and see what it would
    take to open the XML directly into Word. Word is supposed to support
    arbitraty XML, I think the only caveat is it needs a schema to do this. I
    beleive you would basically setup a set of styles in Word for the XMl you
    have and do all the work within Word.

    A lot would rest on how you want to format within Word - how fancy does it
    need to be? How pretty in terms of page breaks and final layout. Either of
    these approaches will benefit from some hand tweaking.

    ..dan

    > Hi,
    > We have a business requirement to export documents from Arbortext 5.4 M020
    > directly into MS Word format. What is the best we can do for this?
    > From what I've read, export to MS Word from Arbortext is only supported
    > via RTF, taking care to use special "Word fields", "styles",
    > "instructions" and/or stylesheets. And, even then, the Arbortext Styler
    > Guide & Help files suggest there are plenty of limitations and "gotchas"
    > lurking along this path.
    > One of our "power" users suggests she has used elsewhere a feature /
    > add-on to EPIC / Editor that can be used to directly export to, or compose
    > into, a Microsoft Word (i.e., ".doc") format. Does this sound familiar to
    > anyone? What is she talking about? She can't remember the feature, but
    > believes it may be a separately-licensed add-on.
    > Any tips or leads will be carefully followed up!
    > Regards,
    > -- Marty
    >
    >
    >
    1-Visitor
    March 29, 2011
    We've used the Nuance OmniPage17 software to convert .pdf files that came from Arbortext into MS Word.
    1-Visitor
    March 29, 2011
    We wrote our own XML-to-Word conversion. It's XSLT and converts the XML in our custom schema directly to the Open XML standard for Word 2007 (.docx). Not sure I would recommend anyone going down this path, however (it was a long and difficult venture). If you can generate a PDF, there are some excellent 3rd-party tools that convert PDF to .docx, .pptx, etc.

    Dave
    18-Opal
    March 29, 2011
    Hi Dan & Marty--

    Dan, Import/Export is the thing Marty refers to that uses RTF as an
    intermediate format for exporting to Word. I also hadn't heard that Word
    is now some kind of general XML editor--do you have any pointers to web
    sites or other resources that describe this? I know it uses XML under
    the hood for its new .docx format, but I don't know about editing for
    arbitrary schemas. In that case does it enforce the schema constraints
    to prevent you from making the document invalid?

    Exporting to Word or round-tripping from Word is something that business
    users frequently ask for, because they don't quite understand how XML
    authoring is different from Word authoring. One of the reasons that it's
    hard to solve this problem of getting XML data to and from Word is
    because they *are* very different things. In most cases the right
    solution is to try to educate the business users about the differences
    (and the advantages of structured authoring, of course), so they
    understand why sticking their content in Word is a Bad Idea (TM).

    As a workaround, if you are just trying to get something users can open
    in Word for the sake of reviewing, and approximate formatting is good
    enough, you might try publishing to HTML and then importing the HTML
    into Word. We used that process where I used to work and it was a
    reasonable compromise for users who didn't have Arbortext and just
    wanted to be able to mark up a file with review comments.

    --Clay

    Clay Helberg
    Senior Consultant
    TerraXML
    1-Visitor
    March 29, 2011
    > We've used the Nuance OmniPage17 software to convert .pdf files that came
    > from Arbortext into MS Word.
    >
    >

    My experience with output from tools like that is it looks good but it is
    not in a form that you want to edit. I've found that they tend to make a
    graphic page to try and maintain page content and line breaks (page
    fidelity). By making all these text blocks though, you can't edit a
    continuous document.

    Ultimately it comes down to your definition of being in Word and the final
    use for that format.

    ..dan
    1-Visitor
    March 29, 2011
    Speaking of which, does anyone have a map for Arbortext to Word for MIL SPEC 38784C-BV7? (never hurts to ask)
    1-Visitor
    March 29, 2011
    Hi Marty,



    Usually I have the problem of going the other way, from word to SGML
    into our custom DTD. No two word documents are ever the same. 🙂



    I'd be thinking about a couple of things:



    Is the intent to round trip the content as part of some editing cycle?
    Losing the value of your markup is definitely an issue and indicates a
    complete lack of understanding by the customer. My concern would be that
    you will lose your valuable markup on export, and if the document is
    edited and returned, you face any amount of non-compliant styles which
    have suddenly become part of the Word document for any potential return
    trip.



    How much do I want to fight with this to get the output into word?
    Workarounds always cost time and money in the end.



    What does the output to word have to look like? I sometimes drop my
    markup into a text editor and clean it up with Omnimark if people don't
    want to see any markup. I can insert that content into any text editor.
    I believe it is possible to create a "styled" RTF file using Omnimark,
    but again you have to create that scripting yourself. I haven't tried it
    myself.



    Most often I find that people will prefer a PDF information product
    generated from Arbortexts software. Acrobat Professional will convert
    such documents to MS Word, but not nicely in all cases, and that often
    forces an attempt to use OCR software, which can introduce errors into
    the text. I really don't recommend going that way since you can't be
    certain the content is correct without a review process, and what then
    if mistakes are found?



    I just recently had a conversation with PTC regarding Arbortext
    Import/Export some of which I'll pass on here. It can be obtained as a
    plug in for the editor. It will let you export XML documents to RTF for
    use by any application that opens the RTF format. I quote the
    documentation here, "Using Arbortext Import/Export, you can export XML
    documents to RTF based on stylesheet definitions you create using
    Arbortext Styler. Arbortext Styler must be installed and licensed to
    create export stylesheets and to preview RTF documents being exported
    with Arbortext Import/Export." It looked a pretty good tool to me. At
    the moment I have neither but I'd like to have them. 😉



    Greg


    1-Visitor
    March 29, 2011
    For what it's worth, we've done the same thing (using Word 2003 WordML) for several custom document types and it works well, and wasn't too awful to set up. But they're fairly simple document types, so your mileage may vary.

    -James
    1-Visitor
    March 29, 2011
    Marty:



    Arbortext Export has both advantages and limitations. The viability often
    depends on who the consumer of the Word document is to be.



    If the purpose is ...

    1. content review, such as review by subject matter experts who do not have
    (nor want) an XML Editor,

    2. and if PDF review is considered unwieldy,

    3. and you are using a Styler stylesheet...



    then Arbortext Export can be a great solution.

    Granted those are three qualifiers, but often they are all true at the same
    time.



    The RTF output, by and large, looks identical to Word .doc or .docx files,
    and looks similar to PDF output from the same stylesheet. There is a
    limitation that Word graphics which need to be scaled, must also be embedded
    rather than linked, but if item # 1 (above) is true, you want probably want
    embedded graphics in order to provide one RTF file (like PDF) rather than a
    collection of a .doc file plus many image files (like HTML output). There
    are other limitations, such as the export of tables within tables, but
    that's not usually an issue.

    Out of the box....

    . a Styler stylesheet for your document type will export RTF that
    generally looks and feels like PDF output from Styler.

    . it will automatically create a Table of Contents as a TOC field,
    but the TOC will have to be manually (or programmatically) generated (once)
    by Word or by the person viewing the file using Word's "Update Fields"
    command (CTRL-A, followed by F9)

    . a single checkbox in Styler allows you to embed all graphics (that
    Word will support as embedded graphics)

    . a single checkbox allows you to generate the Word built-in list
    styles for numbered lists and bullet lists

    . Links, cross-references, footnotes, tables, index terms, and
    indexes all work out of the box, although indexes have the same one-time
    limitation as TOC's because INDEX fields generate the Index tables (and
    there is no automatic updating of certain generated text in Word).

    . Headers and footers as defined by Styler, although there are a few
    limitations



    If the goal is page-by-page fidelity with your Styler-generated PDF output,
    RTF Export will be frustrating.



    If the purpose is for RTF to be the primary composition output for your
    documents, RTF Export is not the best choice.



    If smarter, advanced Word documents are the goal, that is to say, Word
    documents which use Word styles to do the things in a Word-like manner, you
    can do that using Styler RTF-specific features. Block elements can be
    mapped to built-in or user-defined Word styles. Inline elements can be
    mapped to Word character styles. TOCs, Indexes, Word fields, and Headers
    and Footers can be built according to Word specifications rather than
    standard Styler practices. Customer-derived Word templates (.dot) files can
    be embedded in, and referenced by, the RTF output.



    This more advanced level of effort is more applicable if the consumer of the
    Word doc is going to be making extensive changes to the RTF file, adding new
    content, new headings, new lists, etc. I don't mean to imply that
    round-tripping the information back to Editor is a good idea without
    realizing that round-tripping is a long and winding (expensive) road, and is
    sometimes an impassable road. If planned carefully within the boundaries of
    Import/Export limitations, Word limitations, and the limitations of logic J,
    round-tripping can be useful in a review process, but there are significant
    hurdles.



    So the bottom line is, if your use case fits and you already have a Styler
    stylesheet, then Arbortext Export is probably a short path to nice-looking
    Word documents. They cannot have the same degree of formatting complexity
    as PDF output from Arbortext, because Word is, after all, merely a flat
    paragraph-based word processor. There are limitations, many of which are
    defined by Word's feature set.



    Hopefully this helps you too, along with all the other info you have seen
    today.



    Mark Lambert

    Oberon Technologies, Inc.


    1-Visitor
    March 30, 2011



    In Reply to Marty Ross:



    Hi,


    We have a business requirement to export documents from Arbortext 5.4 M020 directly into MS Word format. What is the best we can do for this?


    From what I've read, export to MS Word from Arbortext is only supported via RTF, taking care to use special "Word fields", "styles", "instructions" and/or stylesheets. And, even then, the Arbortext Styler Guide& Help files suggest there are plenty of limitations and "gotchas" lurking along this path.


    One of our "power" users suggests she has used elsewhere a feature / add-on to EPIC / Editor that can be used to directly export to, or compose into, a Microsoft Word (i.e., ".doc") format. Does this sound familiar to anyone? What is she talking about? She can't remember the feature, but believes it may be a separately-licensed add-on.


    Any tips or leads will be carefully followed up!


    Regards,


    -- Marty





    Hi Marty: Re. "from Arbortext 5.4 M020 directly into MS Word format" I wouldn't start from there. Can your requirement be re-stated as "from XML into OOXML (docx, Word 2003 and later) or "into RTF" (Word 97 and earlier)? I think couching this in app-to-app terms is problematic, as you're limited/tied to the apps themselves. XML is ideal for converting to other formats, so I would prefer to couch the problem in terms of "format-to-format", rather than "app-to-app". If you state the problem in those terms, I think it becomes more achievable - depending on your resources 🙂


    We have for years converted XML to both RTF (just give the documents a .doc extension) and, more recently, to OOXML (docx). You don't mention round-tripping so I assume it's not a requirement. Just as well, as AFIAK no-one has yet managed it without unacceptable loss of data/quality. Personally I don't think it's been achievable to date, though if Word continues to converge with XML, who knows?


    We use XSLT to go from XML to both RTF and OOXML. Our XML-to-OOXML process yields fully-styled Word docs with sophisticated functionality, including automated paragraph renumbering (outline numbering linked to styles) and dynamic cross-references. These documents support automated re-styling by end users (easy conversion to House Style), which is another of our requirements. The paragraph numbering (e.g. 3.4.5.6) that the editors see inthe Arbortext screen FOSI matches that in output the Word documents. I don't know of any Word feature that cannot be supported in XML, and which cannot be achieved via XML-to-OOXML conversion, though I haven't "gone looking for trouble": I can only say that we're able to single-source our content in XML and support Web, Hard Copy, Word and PDF.


    You don't mention what DTD/schema you're converting from. I'm not very well up on this, but aren't there ready-made XSLT scripts that will convert from, for example, DocBook XML to RTF/OOXML? I know there are free DocBook to HTML and DocBook to PDF tools, so I would hav expected DocBook-to-RTF to be out there too. Maybe it's not, since RTF is proprietary (and PDF isn't???)


    Hope this helps.