cancel
Showing results for 
Search instead for 
Did you mean: 
Security Alert Log4j Security Vulnerability. Click here to know more.
cancel
Showing results for 
Search instead for 
Did you mean: 

Importing word into structured text and html.

mkoskinen
1-Newbie

Importing word into structured text and html.

I have several word documents with the typical content (text, tables, figures). My goal would be to automagically convert these word documents into structured text, which could then be formatted into html. How can i import a word document into arbortext (do i need a separate import/export tool or something like that)? I have a trial license and i only see the arbortext editors in my start menu, no import tool.

1 REPLY 1

Hi Marko--

The first thing I would try is copy and paste. I don't know if it will handle your graphics, but it will try to do something reasonable with the main content and with tables. You might not get exactly what you hope for, but you should get something that you can clean up without too much effort. A lot depends on how well the source Word documents are formatted--do they use consistent paragraph and character styles, are they free from "format fudging" such as using empty paragraphs to insert white space, does the content structure make sense, etc.

If that doesn't work well enough, I've also had luck using LibreOffice to open a Word document, and save it as Docbook XML. From that point you should be able to convert to your target doctype.

Arbortext does offer a very powerful Import feature, but you have to pay an extra license fee for it, and it is probably overkill for your use case unless you have a large number of similarly-formatted Word documents to convert. It also takes a bit of training to get the hang of developing templates to map input styles into output markup.

--Clay

Announcements