cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
Showing results for 
Search instead for 
Did you mean: 

Community Tip - Need to share some code when posting a question or reply? Make sure to use the "Insert code sample" menu option. Learn more! X

How to convert large number of MS Word docs to XML

rgrobler
7-Bedrock

How to convert large number of MS Word docs to XML

Hi,

Is there perhaps a way to convert a couple of hundred MS Word documents to Arbortext XML. I know that you can develop a map template using import/export, but that isn't a viable option since there is a time constraint on the project and the documents are not necessarily similar in terms of formatting/styling. I'm hoping there is some sort of script that can be run to convert the documents. The ideal situation would be to convert to (specifically) DITA, but if I could just get the content into non-DTD specific XML (without having to copy and paste) that would still be great.

So, ideally a mass-conversion to DITA, but even just automatically getting it into XML for future manipulation would be a good result. Any ideas would be much appreciated!

Thanks

4 REPLIES 4

I'm not really sure of the volumes you are talking about (200 x 10pp? 200 x 1000pp?) but for large volumes of documents, with mismatched styling, it is often easier (and cheaper) to send these offshore for processing. I can recommend a partner company we use if you're interested.

I should note that if you want to do this by hand, Arbortext 6.0 and above let you copy+paste MS Word content directly in and it will be converted to XML on the fly. Not necessarily "perfect" XML but it gives you a starting place.

Thanks for the reply Gareth. I would definitely be interested in contacting the partner company. Could you send me their details?

Also, what is the process one should follow when doing the copy and paste? I invariably get an "invalid paste structure" response when attempting this. What am I missing?

Try searching the support Knowledge Base first. It might be that you are trying to paste into a custom DTD structure which is not configured for use with the "Smart Paste" feature.

Announcements

Top Tags