Showing results for 
Search instead for 
Did you mean: 
Showing results for 
Search instead for 
Did you mean: 

automate export of text from folder of Illustrator graphics?


automate export of text from folder of Illustrator graphics?


We have a document we are contemplating migrating from FrameMaker to XML.
It contains several hundred mainframe screens that are currently authored
in Adobe Illustrator ... but are essentially just text. I am wondering if
there is a way to automatically export the text from the Illustrator files
into simple text files (after which I could convert/import them into our
XML where a warm, happy content model is just waiting to format them all

So for example, the process would take a folder full of:

And output:

Also happy to hear about other (semi-)automated paths anyone else might
have taken for such an endeavor.

Paul Nagai


Doesn't AI have some automation built into it which can output text from an ai file?

John Sillari
Chief Technologist
Dayton T. Brown, Inc.

Just a quick Google search and came across the following. Talks about translation but basically the same idea. Looks like data sets might work but also might require modification to the source file.

IIRC, AI files are really just PostScript with some extra info embedded in comment fields, or at least they were way back when. Is that still true? If so, you might be able to use Ghostscript to extra the text bits.

Thanks for the tips and pointers and links so far, everyone. The
conversation is advancing.

As always, balancing the acquisition costs (I don't currently have AI) plus
the development time vs. offshore (assuming licenses already exist there)
time to just cut and paste is the trick ...

Clay: Hmmm. I opened an .ai file with a text editor. While I can read some
portions, none of the readable bits contain the text of the screen, at
least not in plain text. It could be hashed or mashed or otherwise bashed.
(That might not have been quite what you meant, but I know I vaguely
remember being able to open many PostScript files and read some of the text

Ghostscript might be smarter about translating the text into human readable

Hi Paul--

A quick check to see if it's PostScript is to look at the first line. If it starts with "%!PS-Adobe-{version number}", then it's PostScript, and hopefully Ghostscript would be able to do something sensible with it.

PostScript files, especially ones from design programs like AI, usually won't have the text contents in human-readable form, because text may be stored with a custom (sometimes binary) encoding, and even if it is there in ASCII format, it may be broken down by individual characters so that each glyph can be placed precisely on the page. But Ghostscript is usually pretty good about sorting all that out and giving you back usable text, as long as the text layout in the original document isn't too wacky.


Hilarious. First line: %PDF-1.4

I can open it w/Acrobat and copy the text. LOL! Doesn't get me automation
(at least not directly) but it sure gets around the licensing issues
associated with Illustrator.

Cool. Thanks for the pointer. The PDF-guy is looking at it, too. I bet, if
nothing else comes up, PDFtotext will work nicely. The AI/PDFs are super
simple and pretty much only contain the text and a box around it.


You might try saving your file format from AI to SVG, and then import
the SVG into IsoDraw. We've been bringing various drawing formats into
IsoDraw and have found that SVG gives better fidelity and editable text.
For example: we've gone from other vector drawing formats to PDF, open
that in AI (CS4), save as SVG and then import that. It's a bit of a
workaround but there have been fewer glitches than directly importing
some drawing formats or completely incompatible drawing formats. In your
case I'd save your AI file to SVG and see what happens when you open the
SVG file in IsoDraw, you may find you get better fidelity. It's worth a


That sounds pretty doable. I'm using pdftotext right now for a data conversion project and it works really well for the documents we're converting.