cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
Showing results for 
Search instead for 
Did you mean: 

Community Tip - Learn all about PTC Community Badges. Engage with PTC and see how many you can earn! X

DOCTYPE declaration sneaking into Schema based XML

ptc-953960
1-Newbie

DOCTYPE declaration sneaking into Schema based XML

We're witnessing in Editor (5.3 M020) a DOCTYPE declaration sometimes
appearing in our DITA based schema-aware XML after checking into our
repository. The insertion is not intentionally performed and appears to
be at random without notice to the user.

When the writer opens or checks out an affected document from our
content management system, they'll be greeted with a dialog window that
says:

vvvvvvv
dialog title: Invalid Schema/DTD file
dialog text: "[A15000] Unable to load document type due to errors
parsing DTD: ..." + file path of the schema file.
Choose:
Browse fro Schema/DTD
Open in free-form mode without Schema/DTD
Open as text
OK Cancel Help
^^^^^^^

For example, we'll have the following valid XML fragment (I added white
space to make it more readable):


<sampledita<br/> xmlns:xsi="
xsi:noNamespaceSchemaLocation="SAMPLEDITA.xsd">...

Then is suddenly will be changed to (I added white space to make it more
readable):



<sampledita<br/> xmlns:xsi="
xsi:noNamespaceSchemaLocation="SAMPLEDITA.xsd">...

The difference between the two examples above is a result of the
insertion of the line:

The problem is that the document cannot be rendered in Editor without
selecting an option in the above dialog window and the writers are
clueless as to when the insertion is happening. The writers end up
checking the altered document into our content repository. The problem
does not become known until an attempt is made to read or check-out the
XML.
Moreover, I'm unable to get a handle on the DOCTYPE declaration (in ACL
or JavaScript) when the XML is instantiated in Editor (after selecting
the dialog window option: "Open in free-form mode without Schema/DTD").
My inability to get a handle may because I've necessarily selected an
option that excludes it from the instance rendered in Editor? I was
hoping I could search for and delete the doctype declaration before
check-in, but I'm unsuccessful in programmatically getting a handle on
that node -- if it exists when instantiated in Editor.

When we use the command "edit -current -untagged " then the SAMDPLEDITA SYSTEM "SAMDPLEDITA .xsd"> appears. Then we can manually
remove the offending line, check the document back in and the problem is
solved.

I would like to be able to remove the doctype declaration in a script
before check-in as a work-around, but ultimately I'd like to know what
is causing the insertion the doctype declaration. The problem seems to
occur seldomly, e.g. 1% to 5% of the time.

Two questions:
1) has anyone else had doctype declarations appear in their schema-based
XML?
2) is there a way to get a handle and subsequently delete the doctype
declaration?

John
--



Oracle Email Signature Logo
John Laurence Poole | Principal Software Engineer | 650.607.0853
Oracle User Assistance Engineering
M/S 2op1070
500 Oracle Parkway
Redwood Shores CA 94065-1677

Oracle Instant Chat: john.poole

The statements and opinions expressed here are my own and do
not necessarily represent those of Oracle Corporation.

6 REPLIES 6

I'm fairly sure there is no case where Arbortext Editor puts the
doctype declaration after the comment line,
so I'm guessing the doctype line is being added by some other
part of the check-in process. If this is the case, then it's
not going to help figure out how to "search for and delete the
doctype declaration before check-in" while in the Editor.

As far as finding and deleting the doctype declaration when
bringing the Editor up on an already corrupted document, you
might be able to write an ACL hook to open it in untagged
(text) mode and delete it, but that might take a bit of doing.

Hopefully, you'll be able to figure out where it's getting
added and fix that, but meanwhile, you might just explain to
users that they should "Open as text" and delete the doctype
declaration, File -> Save, then File -> Revert to Saved to
have it reopened as an XML document.

paul

Unfortunately, I've got nothing.

Other than the observation that we have a sneaky problem that occurs vary
rarely (and is, of course, impossible, so far, to reproduce). We have an
internal debate going as to whether:

A) Documentum does a bad thing on checkin.
B) Documentum does a bad thing on checkout.

Authors, as in your situation, are alerted to a problem when they load XML
into Editor. (The result is a document with child references that are not
fully resolved and replaced with their XML.)

You sound pretty confident that Arbortext is responsible. Is there a chance
that your content management is XML-aware and mishandling the XML somehow?

The problem is not manifesting itself within the Documentum workflow.

The problem only occurs within our DITA (schema) based work flow which
only uses Oracle iFS.

I own all the check-in code (ACL/JavaScript/Java) for this work flow
except for Arbortext's servlet code which remains unavailable to us (I'm
confident the servlet code is not causing this, it would have to be
pretty sophisticated and aware of our custom/doctypes tree); I'm almost
certain my check-in code is not performing an insertion /per se/;
however, I'll have to dig through make such assertion with 100%
confidence -- it's not the kind of action I would do, much less be
unaware of.

I'm thinking the problem is something happening during the editing
session that is a non-event to the writer, so they don't contact us
saying "some window popped up, or something happened". Something must
be triggering the treatment of the instance as a doctype instead of a
schema. It may be a combination of checking out the object and then
using the Save_As and then opening the object from iFS, again, and
finally checking what is appearing on their window back in.

We're investigating and I'll share our findings when we pin this down.

Paul and Paul, thank you for your comments.

Paul Nagai wrote:
> Unfortunately, I've got nothing.
>
> Other than the observation that we have a sneaky problem that occurs
> vary rarely (and is, of course, impossible, so far, to reproduce). We
> have an internal debate going as to whether:
>
> A) Documentum does a bad thing on checkin.
> B) Documentum does a bad thing on checkout.
>
> Authors, as in your situation, are alerted to a problem when they load
> XML into Editor. (The result is a document with child references that
> are not fully resolved and replaced with their XML.)
>
> You sound pretty confident that Arbortext is responsible. Is there a
> chance that your content management is XML-aware and mishandling the
> XML somehow?
>
> On Tue, Sep 15, 2009 at 9:10 AM, John L. Poole <john.poole@oracle.com <br="/>> <>">mailto:john.poole@oracle.com>> wrote:
>
> We're witnessing in Editor (5.3 M020) a DOCTYPE declaration sometimes
> appearing in our DITA based schema-aware XML after checking into our
> repository. The insertion is not intentionally performed and
> appears to
> be at random without notice to the user.
>
> When the writer opens or checks out an affected document from our
> content management system, they'll be greeted with a dialog window
> that
> says:
>
> vvvvvvv
> dialog title: Invalid Schema/DTD file
> dialog text: "[A15000] Unable to load document type due to errors
> parsing DTD: ..." + file path of the schema file.
> Choose:
> Browse fro Schema/DTD
> Open in free-form mode without Schema/DTD
> Open as text
> OK Cancel Help
> ^^^^^^^
>
> For example, we'll have the following valid XML fragment (I added
> white
> space to make it more readable):
>
>
> <sampledita<br/>> xmlns:xsi="
> xsi:noNamespaceSchemaLocation="SAMPLEDITA.xsd">...
>
> Then is suddenly will be changed to (I added white space to make
> it more
> readable):
>
>
>
> <sampledita<br/>> xmlns:xsi="
> xsi:noNamespaceSchemaLocation="SAMPLEDITA.xsd">...
>
> The difference between the two examples above is a result of the
> insertion of the line:
>
> The problem is that the document cannot be rendered in Editor without
> selecting an option in the above dialog window and the writers are
> clueless as to when the insertion is happening. The writers end up
> checking the altered document into our content repository. The
> problem
> does not become known until an attempt is made to read or
> check-out the
> XML.
> Moreover, I'm unable to get a handle on the DOCTYPE declaration
> (in ACL
> or JavaScript) when the XML is instantiated in Editor (after selecting
> the dialog window option: "Open in free-form mode without
> Schema/DTD").
> My inability to get a handle may because I've necessarily selected an
> option that excludes it from the instance rendered in Editor? I was
> hoping I could search for and delete the doctype declaration before
> check-in, but I'm unsuccessful in programmatically getting a handle on
> that node -- if it exists when instantiated in Editor.
>
> When we use the command "edit -current -untagged " then the > SAMDPLEDITA SYSTEM "SAMDPLEDITA .xsd"> appears. Then we can manually
> remove the offending line, check the document back in and the
> problem is
> solved.
>
> I would like to be able to remove the doctype declaration in a script
> before check-in as a work-around, but ultimately I'd like to know what
> is causing the insertion the doctype declaration. The problem
> seems to
> occur seldomly, e.g. 1% to 5% of the time.
>
> Two questions:
> 1) has anyone else had doctype declarations appear in their
> schema-based
> XML?
> 2) is there a way to get a handle and subsequently delete the doctype
> declaration?
>
> John
> --
>
>
>
> Oracle Email Signature Logo
> John Laurence Poole | Principal Software Engineer | 650.607.0853
> Oracle User Assistance Engineering
> M/S 2op1070
> 500 Oracle Parkway
> Redwood Shores CA 94065-1677
>
> Oracle Instant Chat: john.poole
>
> The statements and opinions expressed here are my own and do
> not necessarily represent those of Oracle Corporation.
>
>

Just did a test in 5.2(M010).
Used a very simple file that was originally like this:




Opened it in Editor and clicked the Save toolbar button. Got this:





Reopened it in Editor and added a text entity. Got this:




]>


Editor will definitely insert a DOCTYPE declaration without explicitly
telling you it is. But then, where else would the XML-literate user
suppose that a text entity would be stored?
<ouch -=" bit=" my=" tongue-in-cheek="/>

In any case, Editor will insert such a declaration after the
line.

My two cents,
Steve Thompson
+1(316)977-0515

It is true that, when an internal subset is needed (e.g., because
there are entities to declare), Arbortext will insert the doctype
declaration (because the internal subset is part of the doctype
declaration), but in Steve's case there is no external subset--that
is, there is no PUBLIC or SYSTEM id whereas in John's case, the
inserted doctype declaration had SYSTEM "SAMDPLEDITA.xsd".

I cannot think of a scenario where the editor would insert a
doctype declaration with a PUBLIC or SYSTEM id.

Also, while the Arbortext comment comes before the doctype
declaration when there is an internal subset (because the
internal subset could be very long, and we want the comment
near the top), the editor puts the comment after the doctype
declaration when there is no internal subset, and John's case
had no internal subset.

So I'm still dubious that the editor is inserting that doctype
declaration. (Not that it couldn't be the case, but I'd certainly
look for other culprits.)

paul

Although we're still scratching our heads and trying to create a case
that is reproducible, we're thinking that the problem might occur when a
writer checks out an object and performs as Save_As to the local file
system (we disengaged "Save_As" many years ago so writers had to work
only with objects, but we had to open it up again to accommodate a team
in transition). I'm wondering if Editor caches values when opening
objects from a content repository and when the unexpected happens, e.g.
a writer uses a Save_As, the cache is not fully cleared/altered and
subsequent opening of an object in the same session results in the use
of a value.

some findings:

We did witness a session of Editor prompting the "Invalid Schema/DTD
file" against a particular object, yet in another session of Editor, the
object opened as it should. This suggests it was not the object that
was tainted, but that the session opening it was affected.

For instance, we saw the "Invalid Schema/DTD file" dialog using the most
recently saved_as directory path as the default directory in which to
search for the DTD... this suggested some cached directory value was
being altered.

Grosso, Paul wrote:
> It is true that, when an internal subset is needed (e.g., because
> there are entities to declare), Arbortext will insert the doctype
> declaration (because the internal subset is part of the doctype
> declaration), but in Steve's case there is no external subset--that
> is, there is no PUBLIC or SYSTEM id whereas in John's case, the
> inserted doctype declaration had SYSTEM "SAMDPLEDITA.xsd".
>
> I cannot think of a scenario where the editor would insert a
> doctype declaration with a PUBLIC or SYSTEM id.
>
> Also, while the Arbortext comment comes before the doctype
> declaration when there is an internal subset (because the
> internal subset could be very long, and we want the comment
> near the top), the editor puts the comment after the doctype
> declaration when there is no internal subset, and John's case
> had no internal subset.
>
> So I'm still dubious that the editor is inserting that doctype
> declaration. (Not that it couldn't be the case, but I'd certainly
> look for other culprits.)
>
> paul
>
>
>
Top Tags