Skip to main content
1-Visitor
November 20, 2014
Question

How to convert large number of MS Word docs to XML

  • November 20, 2014
  • 1 reply
  • 2536 views

Hi,

Is there perhaps a way to convert a couple of hundred MS Word documents to Arbortext XML. I know that you can develop a map template using import/export, but that isn't a viable option since there is a time constraint on the project and the documents are not necessarily similar in terms of formatting/styling. I'm hoping there is some sort of script that can be run to convert the documents. The ideal situation would be to convert to (specifically) DITA, but if I could just get the content into non-DTD specific XML (without having to copy and paste) that would still be great.

So, ideally a mass-conversion to DITA, but even just automatically getting it into XML for future manipulation would be a good result. Any ideas would be much appreciated!

Thanks

    1 reply

    16-Pearl
    November 20, 2014

    I'm not really sure of the volumes you are talking about (200 x 10pp? 200 x 1000pp?) but for large volumes of documents, with mismatched styling, it is often easier (and cheaper) to send these offshore for processing. I can recommend a partner company we use if you're interested.

    I should note that if you want to do this by hand, Arbortext 6.0 and above let you copy+paste MS Word content directly in and it will be converted to XML on the fly. Not necessarily "perfect" XML but it gives you a starting place.

    rgrobler1-VisitorAuthor
    1-Visitor
    November 21, 2014

    Thanks for the reply Gareth. I would definitely be interested in contacting the partner company. Could you send me their details?