cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
Showing results for 
Search instead for 
Did you mean: 

SGML CDATA element problem

Highlighted
Newbie

SGML CDATA element problem



Hey folks, this is for the old timers out there, like Ed, Andy, Eliot,
myself and anyone else who really remembers SGML.

I have a DTD with an element with a content model of CDATA (SGML allows
this, XML DOES NOT). The element declaration is



Okay, I'm working on a handbook that explains how to tag with our DTD and
we want to provided examples of tagged data.

Now here's what's happening (Epic 5.1 on Windoze XP SP1 machine). For
brevity and simplicity, I've been using the following sample.

<a.statement>

<text>Approved for public release; distribution is
unlimited.</text></a.statement>

What I want to do is put this statement into the <sgmlexample> element like
this

<sgmlexample>
<a.statement>

<text>Approved for public release; distribution is
unlimited.</text></a.statement>
</sgmlexample>

Now when I do this, I get one of two things will happen

1. If I try to copy and paste this from one Epic file to another, I am
told the paste is out of context. When I turn context rules OFF and paste
the sample, this is what I get.

<sgmlexample>Approved for public
release; distribution is unlimited.
</sgmlexample>

Notice that the <a.statement> tags disappeared.

2. When I copy the markup I want to an ASCII editor and then copy it again
and paste it, I get this:

<sgmlexample><a.statement>
DISTRIBUTION STATEMENT A.<text>Approved for public release;
distribution is unlimited.
</sgmlexample>

Notice that ALL my end tags have vanished.

Somewhere in the translation (sounds like a Bill Murray movie), I am losing
some tags. Most commonly, all my end tags as I thought option 2 was the
easier route. One thing I liked with this option is that if I looked at
the content of the <sgmlexample> element, the start tag open (stago's) had
all been converted to entities (the text below was taken from the markup
inside my data from example 2). I didn't even think about end tags until
one of our other folks, working another section of the handbook, noticed
the missing end tags.

<a.statement> <title>DISTRIBUTION STATEMENT A.<text>Approved for
public release; distribution is unlimited.

Now I know that CDATA will not read markup except to look for entity
declarations and end tags. So somewhere Epic is truncating the end tags in
the paste buffer. Which is interesting because it is changing the start
tags to entities (and CDATA does not read the start tag, it treats it as
text, the difference between PCDATA and CDATA).

I tried changing my example entity file so that all the start and end tags
were entities, something like this:

<a.statement><title>DISTRIBUTION STATEMENT
A.</title><text>Approved for
public release; distribution is unlimited.</text></a.statement>

When I look at this grabbing the entire <sgmlexample> I see the above.
When I grab just the <sgmlexample> content, Epic has changed the < for
<. So the above markup winds up looking like this:

<a.statement><title>DISTRIBUTION STATEMENT
A.</title><text>Approved for
public release; distribution is
unlimited.</text></a.statement>

Can we say frustrating 🙂 Anyone have any ideas?? At this point, XML IS
NOT an option though I do have control of the DTD. I have a feeling this
has something to do with the way CDATA is processed and I am going to be
stuck with having to fix this MANUALLY (think I can use a regular
expression fix, but darn it's not what I want to do).

Lynn
Tags (2)
4 REPLIES 4
Highlighted

Re: SGML CDATA element problem

CDATA element content ends at *the first end tag*
encountered, not just at the "matching" end tag of
the CDATA element.

I know, this is counter-intuitive, and this makes
SGML's CDATA element content pretty worthless for
what you want to do. But that's SGML's CDATA element
content for you.

So Epic is trying to recover as best it can by
tossing the end tags that would end sgmlexample
so that at least you still have a document whose
tags are well-balanced (what in the XML world
would be called "well-formed").

To accomplish what you want, you may be able to
use a CDATA marked section (which, by the way,
also exists in XML, fwiw).

paul
Highlighted

Re: SGML CDATA element problem



Paul,

Thanks, that is what I figured the situation would be. What was getting me
is that Epic would change the stago to the < entity but not the etago
(though in 8879 the stago is the < and the etago is perspective I can understand the difference). Five years ago I would have
suggested having Epic look at the data going into the CDATA element and
convert all < to the <. Now as SGML slowly withers away, It just isn't
worth it. I do thank Arbortext for the continued SGML support.

While I do have you on the spot, any idea as to why trying to copy the
markup directly from Epic drops the initial wrapper element's start and end
tag? Something about maybe putting something like an OID (or similar) in
the Epic paste butter? Or a balanced tag thing? What I saw was a bit
unexpected.

For what we are trying to do, I don't think CDATA marked sections really
would work though I may give it a try and see what happens. One reason
that marked sections probably wouldn't work is that we use marked sections
in our text entities to differentiate between page based and electronic or
to include other services (got to love the US DOD) notices and variations.
As we try to show those, things could get real confused real easy, and they
are already confused enough. 🙂

Lynn
Highlighted

Re: SGML CDATA element problem

Lynn,

By making sgmlexample's content model CDATA, you are
saying that you DON'T want markup escaped. That's what
CDATA means.

If you just defined sgmlexample's content model as #PCDATA,
made sure you told Epic to:

set sgmlselection=off

(see help topic 9097) then when you paste

<a.statement>

<text>Approved for public release; distribution is
unlimited.</text></a.statement>

into sgmlexample in Epic editor, Epic should do what you
Highlighted

Re: SGML CDATA element problem



Paul,

Yep, for two reasons,

1. It's a pain to deal with
2. Without a DTD/schema to tell you if the content is PCDATA or CDATA, it
would all be treated as PCDATA. at which time, see number 1.

Again, many thanks.

Lynn
Announcements