cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
Showing results for 
Search instead for 
Did you mean: 

Community Tip - Help us improve the PTC Community by taking this short Community Survey! X

xsl question: converting < to < (and disabling output escaping if necessary)

naglists
1-Newbie

xsl question: converting < to < (and disabling output escaping if necessary)

Hi,
Support has said that datamerge is working as designed by returning
(as stored in the database) as and that it is up to me to convert
markup from < to < on output via the XSLT. That conversation is ongoing.

BUT if one of you XSL gurus knows how to do this, maybe I can leave support
alone.

I have tried swapping the overall output method to text (that was a hail
mary) and believe this is the "best" answer but it fails:
<xsl:value-of select="translate(self::node(),'&lt;','&lt;')"&lt;br"/>disable-output-escaping="yes"/>

The following "works" in that it returns the string to Editor with a ! where
there should be a < so the select and the translate are working:
<xsl:value-of select="translate(self::node(),'&lt;','!')"&lt;br"/>disable-output-escaping="yes"/>

All attempts to use the < fail with the following:
FATAL ERROR:
javax.xml.transform. TransformerException: Error reported by XML parser.

ERROR:
Failed to update query: (my query name)
com.arbortext.epic.datamerge.DataMergeException: Failed to construct XSL
Stylesheet transformer.



Any thoughts?

(FWIW: Michael Kay can be found responding to this question (more or less),
in his own inimitable way, "Fix the #$#% program generating the bad
markup!")

--
Paul Nagai
14 REPLIES 14

Hi Paul-



It looks like you and/or the application (or maybe both 😉 are
confusing markup strings with XML elements. I don't think simply
swapping out "<" with "<" is not going to get you where you want to
go.



I think you will have two options here:



1) If the markup you are reading from the database (which is
getting returned as markup strings with entity replacement) is fairly
simple, you may be able to write a template to convert the markup to
elements in document. That would look something like this (untested,
debugging left as an exercise):



<xsl:template name="markupToNodes">

<xsl:param name="markup" select="."/">

<xsl:variable name="tagname"&lt;br"/>select="substring-after(substring-before($markup,'>'),'<')"/>

<xsl:element name="{$tagname}">

<xsl:copy-of<br/>select="substring-before(substring-after($markup,'>'),'<')"/>

</xsl:element>

</xsl:template>



This code assumes that each entry is going to be a single element with
no attributes and only PCDATA as content. The more complicated your
chunks of markup can be, the more complicated the template will have to
be to handle it. If you are skillful with Google searching, you might be
able to find something online that already solves this problem (but I
don't know for sure, not having done the exercise myself).



2) You can keep the results as escaped markup strings as far as
Data Merge is concerned, and then fix it up post-hoc in Arbortext using
some kind of callback to massage the markup strings into parsed content.



Which of those options you choose will depend on the complexity of the
markup chunks you have to deal with, and your relative comfort level
with XSLT vs. ACL.



--Clay



Clay Helberg

Senior Consultant

TerraXML


Hi Paul,

I don't know how the XSLT you're running fits into the rest of your process, but if it's possible for you, another solution is to change the output method on your XSLT. An output of text <xsl:output method="text"/> should avoid any XML parsing errors.

Alternatively, depending on the rest your process, you could also use ACL to read and update the file before you pass it to your XSLT. The open(), getline(), replace() and put() functions in Epic work very quickly in my experience - definitely faster than an XSLT for manipulating text the way you want.

Of course with either of these methods you lose the reassurance of always having well-formed output. Also, you'll want to watch for actual < and > tags in the data by screening for " < " and "> " and leaving them as they appear.

Cheers,

Dugald

Just worked on a similar situation yesterday. Assuming you are using an XSLT 2 processor you should be able to use something similar to the line below.

<xsl:value-of disable-output-escaping="yes" select="replace($INPUT, "&lt;","&lt;")"/>

Hey, Paul...

I agree with Clay's assessment. Within the XSLT, these are just
strings of characters. Your attempts to use method="text" or
disable-output-escaping are most likely being completely ignored
because the XSLT processor is being directed to place the result in a
DOM, which has no good way to represent those directives.

Unfortunately, this means that you're going to need an XML parser
somewhere in the chain. Given the choice between attempting to parse
XML with XSLT and writing some ACL code to capture the
insertion/update of the query result and use Editor's built-in parsing
facilities, I'd go with the latter.

Let us know how it goes, whichever route you choose.

-Brandon 🙂


Alessio
15-Moonstone
(To:naglists)

I apologize in advance since I did not have (nor will I have in the short term) a chance to have a look at the specific use case.
But since one suggestion is to insert an XML parser in the chain I wanted to put an additional option on the table.
By creating a new Datamerge source class (by inheriting the proper Java interface) it would be possible to do all the "hard" work in Java (use the XML parser there, upstream) and have that Java class provide the 'final' DOM node back to Datamerge, downstream.
Creating a new Datamerge source is not a big deal, not sure if it is documented though.

Thanks all for your feedback and suggestions. While I am not an XSL guru by
any means, I have a fair to middling understanding of the difference between
text strings and parsed/parsable markup. What I don't understand is whether
Arbortext intends datamerge to be able to return markup and if so, how. I
said as much in my call, that changing text to markup in XSL is not
generally recognized (again, not a guru ... reporting only what I have
googled) as a good idea except for some very specific use-cases.

Anyhow, to review, the datamerge configuration includes:
a) An ODBC Source
b) A .dmf file that contains the data soruce definition, query definition,
and connection definition.
c) A .dcf reference to the .dmf.
d) An XSL stylesheet that processes the results of the datamerge prior to
insertion into the XML instance in which the datamerge is being performed
and stored.

Bonus: A trip through the content pipeline.

I am trying to fix the "problem" of < that have been escaped into < along
the way in the XSL stylesheet. My sole act, once a - d above have been
configured, is to update the datamerge in Editor (Tools > Data Merge >
Update). I believe that d, the XSL stylesheet is the last handler of the
datamerge results prior to insertion in Editor, but I guess I don't know
that.

Here comes the latest on where things stand:

1) PTC/Arbortext/Paul Grosso supplied a completed template very similar to
the one Clay suggested. Ignore the <xsl:text>ASDF and <xsl:text>QWERTY
lines. Those were just to validate that the template was being called,
called properly, and that it was recursing properly, which it was.
Unfortunately, the template as provided nor any "gaming" of the
disable-output-escaping value OR replacing the < with a literal <
resulted in the output I am after.
<xsl:template match="text()" name="unescape-markup">
<xsl:param name="text" select="."/">
<xsl:choose>
<xsl:when test="contains($text,'&lt;')">
<xsl:text>ASDF</xsl:text>
<xsl:value-of select="substring-before($text,'&lt;')"/">
<xsl:text disable-output-escaping="yes"><</xsl:text>
<xsl:value-of<br/>select="substring-before(substring-after($text,'<'),'>')"/>
<xsl:text disable-output-escaping="yes">></xsl:text>
<xsl:call-template name="unescape-markup">
<xsl:with-param name="text" select="substring-after($text,">')"/>
</xsl:call-template>
</xsl:when>
<xsl:otherwise>
<xsl:text>QWERTY</xsl:text>
<xsl:value-of select="$text"/">
</xsl:otherwise>
</xsl:choose>
</xsl:template>


2) replace(self::node(),'<','!') is failing with this error:
ERROR:
com.icl.saxon.expr.XPathExceptiton: Error in expression
replace(self::node(),'<','!'):
Unknown system function: replace

Note: I use ! rather than < to avoid masking a problem with fn:replace with
a problem with <.

I guess datamerge uses a saxon parser that doesn't contain the function.
Further, I don't see how this would be any different, functionally, than
fn:transform which is recognized by saxon. It would, I suspect, still fail
to output the < or to compile.


3) Changing the xsl:output method from xml to text has no impact. The markup
is returned with the < as <.


4) PTC/Arborext/Support suggested examining jdbcSource.ent in
Editor/datamerge folder.

While I think I understand the comments well enough to produce the markup
required to implement parseResultSet, I am not clear on where I am supposed
to put it.

I tried adding the following markup:
<parameterref name="p_parseornot" nameref="parseResultSet">
<value>true</value>
</parameterref>

To a variety of locations within my .dmf. When I added it as a child of:
<parametergroupref name="queryParameters" nameref="sqlParameters">

in my .dmf, it changed the results. The content was returned without ANY
markup at all.

If parseResultSet is set to <value>false</value>, the datamerge fails with
this error:
<heading>Update current query</heading><record date="Thu" feb=" 24=" 14:12:08<br="/>2011" millis="1298585528" severity="info" suppress="1">
<level>MESSAGE</level>
<message>Load datamerge configuration file(s)</message>
</record><record date="Thu" feb=" 24=" 14:12:08=" 2011&quot;=" millis="1298585528"&lt;br"/>severity="info" suppress="1">
<level>MESSAGE</level>
<message>Construct datamerge controller</message>
</record><record date="Thu" feb=" 24=" 14:12:08=" pst=" 2011&quot;=" millis="1298585528884"&lt;br"/>severity="error" suppress="0">
<level>ERROR</level>
<message>com.arbortext.epic.datamerge.DataMergeException:com.arbortext.epic.datamerge.DataMergeException
Unexpected exception : 1</message>
<context<br/>class="com.arbortext.epic.internal.datamerge.controller.EpicDataMergeControllerWrapper"
lineNumber="478" method="updateEpicQuery"
systemId="EpicDataMergeControllerWrapper.java"></context>
</record>

If I omit it (as was the original case), markup is returned but the < are
output as < (the original problem).

I have asked Support:
Is my markup correct?
Have I put this in the right location?


5) Current suspicions / possible wrinkles:
a) Datamerge is not the final step of the content pipeline so IF my
stylesheets are doing what they are supposed to do, that is somehow being
undone in a subsequent step within the pipeline or by the pipeline itself.
b) My XSL stylesheet is itself being transformed and passed into the
pipeline and/or datamerge so not all of my options are possible. For
example, if I change this line:
<xsl:text disable-output-escaping="yes"><</xsl:text>
To:
<xsl:text disable-output-escaping="yes"><</xsl:text>
I get the following error:
FATAL ERROR:
javax.xml.transform.TransformerException: Error reported by XML parser
ERROR
Failed to update query (Fields in freeform paras.(Field)
com.arborteext.epic.datamerge.DataMergeException: Failed to construct XSL
Stylesheet transformer.


Once again, thanks for all your time and thoughts on the subject. I will
keep you posted of any new developments.

PTC has opened a SPR 2056615 against the other problem I reported: Where
mixed data (across rows) can result in an empty cell. My original example
was something like:

Memory
512 MB
4 GB
12

The 12 is not returned when an Excel datasource is being queried.

Hi Alessio,
I think information on how to create a new Datamerge source class,
especially if it is not a big deal, would be really valuable. Is that
something you could share here?

On Thu, Feb 24, 2011 at 5:30 AM, Marchetti, Alessio <
-> wrote:

> I apologize in advance since I did not have (nor will I have in the short
> term) a chance to have a look at the specific use case.
> But since one suggestion is to insert an XML parser in the chain I wanted
> to put an additional option on the table.
> By creating a new Datamerge source class (by inheriting the proper Java
> interface) it would be possible to do all the "hard" work in Java (use the
> XML parser there, upstream) and have that Java class provide the 'final' DOM
> node back to Datamerge, downstream.
> Creating a new Datamerge source is not a big deal, not sure if it is
> documented though.
>
>

Hi Paul-



Sorry, I think I must have failed to explain my point of view clearly
enough, because the XSL template you included below is actually not at
all like the one I suggested. The template you included here is
basically just trying to do the same kind of "<" to "<" substitution
you've tried by other means, to no avail. If you want document elements
to come out of your stylesheet based on the text stream that comes in,
you will have to teach the stylesheet to parse it. As long as the markup
you are getting from the database isn't too complex, this ought to be
possible.



On the bright side, I think I can help with the parseResultSet
parameter. I was able to make it work in my DMF file. It goes inside the
<query> element, something like this:



<query name="employee_list" querytype="table">

<label>Insert Full Employee List </label>

<sourceref name="r1" nameref="MS_Access_Source">



<parameterref name="p_statement" nameref="sqlStatement">

<documentation>This is a source filter that retrieves data from an

Excel database using jdbc-odbc bridge.</documentation>

<value>select * from [Employees$]</value>

</parameterref>

<parameterref name="parseyes" nameref="parseResultSet">

<documentation>This parameter requests that the markup containing

the query results be parsed.</documentation>

<value>true</value>

</parameterref>

.....



I'll attach a copy of the full DMF file, since it's the same one I used
for my 2009 PTC/USER presentation. With this, when I changed one of the
cells of my Excel spreadsheet to include markup, e.g. "John", I
got the corresponding element in my merged table (and it was an actual
document element, not just the markup string).



One important thing to note is that, if you turn parseResultSet on, then
in your XSLT stylesheet, make sure you are using <xsl:copy-of> to grab
the contents of the result set cells instead of <xsl:value-of>. The
latter will strip out any elements and just return the concatenated text
nodes of the context element-the exact opposite of what you are trying
to do!



Oh, and for future reference, replace() is an XSLT 2.0 function, so to
make it work you would have to change your version attribute on the
<xsl:stylesheet> element to "2.0" instead of "1.0". Then, if you're
using Arbortext 5.4, the Saxon engine should recognize and execute that
function. Of course, as we've already established, that's not going to
do what you want right now, since escaped characters really isn't the
issue here. But I thought that bit of information might come in handy
for some other purpose someday.... 😉



--Clay





Clay Helberg

Senior Consultant

TerraXML


No XSLT expert myself and haven't been paying full attention to this thread(commercial rents are down, lease is up,we're moving uptown), but ifrunning a XSLT 2 processor you can useuse-character-maps="[name]" in your output declaration:


<xsl:output omit-xml-declaration="no" method="xml" encoding="utf-8" indent="no" use-character-maps="doc-entities"/">


<xsl:character-map name="doc-entities">
<xsl:output-character character="§" string="§"/>
<xsl:output-character character="–" string="–"/>
<xsl:output-character character="—" string="—"/>
</xsl:character-map>


...


Only used it for numeric character entities so far --not general entities --but it does seem to guarantee output format for html and xml.


- Lou


Lou Argyres
Continuing Education of the Bar - California
Oakland, CA
Lou.Argyres@ceb.ucla.edu

In Reply to Paul Nagai:


Hi,
Support has said that datamerge is working as designed by returning
(as stored in the database) as and that it is up to me to convert
markup from < to < on output via the XSLT. That conversation is ongoing.

BUT if one of you XSL gurus knows how to do this, maybe I can leave support
alone.

I have tried swapping the overall output method to text (that was a hail
mary) and believe this is the "best" answer but it fails:
<xsl:value-of select="translate(self::node(),'&lt;','&lt;')"&lt;br"/>disable-output-escaping="yes"/>

The following "works" in that it returns the string to Editor with a ! where
there should be a < so the select and the translate are working:
<xsl:value-of select="translate(self::node(),'&lt;','!')"&lt;br"/>disable-output-escaping="yes"/>

All attempts to use the < fail with the following:
FATAL ERROR:
javax.xml.transform. TransformerException: Error reported by XML parser.

ERROR:
Failed to update query: (my query name)
com.arbortext.epic.datamerge.DataMergeException: Failed to construct XSL
Stylesheet transformer.



Any thoughts?

(FWIW: Michael Kay can be found responding to this question (more or less),
in his own inimitable way, "Fix the #$#% program generating the bad
markup!")

--
Paul Nagai

Ok, my datamerge is now returning XML to Editor! Whoo hoo!

Setting parseResultSet to true in the DMF does have the desired result BUT
you must also change the top level template in the sample XSL from:
<xsl:template match="/">

To:
<xsl:template match="@*|node()">

OR include a template for each element that might be returned by the query.

I wasn't paying attention to the XSL ... I was assuming something was wrong
with the Datamerge itself. This XSL "error" exists in the hardware.xsl
packaged in the Excel sample, so when I tried implementing parseResultSet
before (based on scant documentation in one of the .ent files support
pointed me to), while it changed my output, it did not have the desired
result. Clay's assertion that he'd gotten it to work allowed m to look
beyond Datamerge and look more closely at the XSL (and take another look at
the raw XML being returned).

There is still a problem with the way that numeric data is handled, but the
SPR is open and will hopefully be addressed in a release soon.

Thanks everyone.

Doh! From:
<xsl:template match="/">
<xsl:apply-templates/>
</xsl:template>

To:
<xsl:template match="@*|node()">
<xsl:copy>
<xsl:apply-templates/>
</xsl:copy>
</xsl:template>


On Mon, Feb 28, 2011 at 6:04 PM, Paul Nagai <-> wrote:
> Doh! From:
> <xsl:template match="/">
> <xsl:apply-templates/>
> </xsl:template>
>
> To:
> <xsl:template match="@*|node()">
> <xsl:copy>
> <xsl:apply-templates/>
> </xsl:copy>
> </xsl:template>

Better make that...

<xsl:template match="@*|node()">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/">
</xsl:copy>
</xsl:template>

...or you'll lose your attributes. The default for an attribute-less
xsl:apply-templates is select="node()", which does not include
attributes.

One gotcha you may want to keep an eye out for is mishandling of
content in columns that are not intended to be parsed as XML. A stray
"<" or "&" could cause some errors or odd results. Unfortunately, the
"parseResultSet" solution is a broad stroke. Perhaps a future
enhancement to datamerge will provide more granular control, such as
support for the XML datatype standardized with (IIRC) SQL2003.

-Brandon Smiley Happy

Good catch. Noted!

On Mon, Feb 28, 2011 at 3:25 PM, Brandon Ibach <
brandon.ibach@single-sourcing.com> wrote:

>
Top Tags