Skip to main content
1-Visitor
February 22, 2011
Question

xsl question: converting < to < (and disabling output escaping if necessary)

  • February 22, 2011
  • 14 replies
  • 4508 views
Hi,
Support has said that datamerge is working as designed by returning
(as stored in the database) as and that it is up to me to convert
markup from < to < on output via the XSLT. That conversation is ongoing.

BUT if one of you XSL gurus knows how to do this, maybe I can leave support
alone.

I have tried swapping the overall output method to text (that was a hail
mary) and believe this is the "best" answer but it fails:
<xsl:value-of select="translate(self::node(),'&lt;','&lt;')"&lt;br"/>disable-output-escaping="yes"/>

The following "works" in that it returns the string to Editor with a ! where
there should be a < so the select and the translate are working:
<xsl:value-of select="translate(self::node(),'&lt;','!')"&lt;br"/>disable-output-escaping="yes"/>

All attempts to use the < fail with the following:
FATAL ERROR:
javax.xml.transform. TransformerException: Error reported by XML parser.

ERROR:
Failed to update query: (my query name)
com.arbortext.epic.datamerge.DataMergeException: Failed to construct XSL
Stylesheet transformer.



Any thoughts?

(FWIW: Michael Kay can be found responding to this question (more or less),
in his own inimitable way, "Fix the #$#% program generating the bad
markup!")

--
Paul Nagai

    14 replies

    18-Opal
    February 22, 2011
    Hi Paul-



    It looks like you and/or the application (or maybe both 😉 are
    confusing markup strings with XML elements. I don't think simply
    swapping out "<" with "<" is not going to get you where you want to
    go.



    I think you will have two options here:



    1) If the markup you are reading from the database (which is
    getting returned as markup strings with entity replacement) is fairly
    simple, you may be able to write a template to convert the markup to
    elements in document. That would look something like this (untested,
    debugging left as an exercise):



    <xsl:template name="markupToNodes">

    <xsl:param name="markup" select="."/">

    <xsl:variable name="tagname"&lt;br"/>select="substring-after(substring-before($markup,'>'),'<')"/>

    <xsl:element name="{$tagname}">

    <xsl:copy-of<br/>select="substring-before(substring-after($markup,'>'),'<')"/>

    </xsl:element>

    </xsl:template>



    This code assumes that each entry is going to be a single element with
    no attributes and only PCDATA as content. The more complicated your
    chunks of markup can be, the more complicated the template will have to
    be to handle it. If you are skillful with Google searching, you might be
    able to find something online that already solves this problem (but I
    don't know for sure, not having done the exercise myself).



    2) You can keep the results as escaped markup strings as far as
    Data Merge is concerned, and then fix it up post-hoc in Arbortext using
    some kind of callback to massage the markup strings into parsed content.



    Which of those options you choose will depend on the complexity of the
    markup chunks you have to deal with, and your relative comfort level
    with XSLT vs. ACL.



    --Clay



    Clay Helberg

    Senior Consultant

    TerraXML


    1-Visitor
    February 23, 2011

    Hi Paul,

    I don't know how the XSLT you're running fits into the rest of your process, but if it's possible for you, another solution is to change the output method on your XSLT. An output of text <xsl:output method="text"/> should avoid any XML parsing errors.

    Alternatively, depending on the rest your process, you could also use ACL to read and update the file before you pass it to your XSLT. The open(), getline(), replace() and put() functions in Epic work very quickly in my experience - definitely faster than an XSLT for manipulating text the way you want.

    Of course with either of these methods you lose the reassurance of always having well-formed output. Also, you'll want to watch for actual < and > tags in the data by screening for " < " and "> " and leaving them as they appear.

    Cheers,

    Dugald

    12-Amethyst
    February 23, 2011
    Just worked on a similar situation yesterday. Assuming you are using an XSLT 2 processor you should be able to use something similar to the line below.

    <xsl:value-of disable-output-escaping="yes" select="replace($INPUT, "&lt;","&lt;")"/>
    1-Visitor
    February 24, 2011
    Hey, Paul...

    I agree with Clay's assessment. Within the XSLT, these are just
    strings of characters. Your attempts to use method="text" or
    disable-output-escaping are most likely being completely ignored
    because the XSLT processor is being directed to place the result in a
    DOM, which has no good way to represent those directives.

    Unfortunately, this means that you're going to need an XML parser
    somewhere in the chain. Given the choice between attempting to parse
    XML with XSLT and writing some ACL code to capture the
    insertion/update of the query result and use Editor's built-in parsing
    facilities, I'd go with the latter.

    Let us know how it goes, whichever route you choose.

    -Brandon 🙂


    15-Moonstone
    February 24, 2011
    I apologize in advance since I did not have (nor will I have in the short term) a chance to have a look at the specific use case.
    But since one suggestion is to insert an XML parser in the chain I wanted to put an additional option on the table.
    By creating a new Datamerge source class (by inheriting the proper Java interface) it would be possible to do all the "hard" work in Java (use the XML parser there, upstream) and have that Java class provide the 'final' DOM node back to Datamerge, downstream.
    Creating a new Datamerge source is not a big deal, not sure if it is documented though.

    naglists1-VisitorAuthor
    1-Visitor
    February 25, 2011
    Thanks all for your feedback and suggestions. While I am not an XSL guru by
    any means, I have a fair to middling understanding of the difference between
    text strings and parsed/parsable markup. What I don't understand is whether
    Arbortext intends datamerge to be able to return markup and if so, how. I
    said as much in my call, that changing text to markup in XSL is not
    generally recognized (again, not a guru ... reporting only what I have
    googled) as a good idea except for some very specific use-cases.

    Anyhow, to review, the datamerge configuration includes:
    a) An ODBC Source
    b) A .dmf file that contains the data soruce definition, query definition,
    and connection definition.
    c) A .dcf reference to the .dmf.
    d) An XSL stylesheet that processes the results of the datamerge prior to
    insertion into the XML instance in which the datamerge is being performed
    and stored.

    Bonus: A trip through the content pipeline.

    I am trying to fix the "problem" of < that have been escaped into < along
    the way in the XSL stylesheet. My sole act, once a - d above have been
    configured, is to update the datamerge in Editor (Tools > Data Merge >
    Update). I believe that d, the XSL stylesheet is the last handler of the
    datamerge results prior to insertion in Editor, but I guess I don't know
    that.

    Here comes the latest on where things stand:

    1) PTC/Arbortext/Paul Grosso supplied a completed template very similar to
    the one Clay suggested. Ignore the <xsl:text>ASDF and <xsl:text>QWERTY
    lines. Those were just to validate that the template was being called,
    called properly, and that it was recursing properly, which it was.
    Unfortunately, the template as provided nor any "gaming" of the
    disable-output-escaping value OR replacing the < with a literal <
    resulted in the output I am after.
    <xsl:template match="text()" name="unescape-markup">
    <xsl:param name="text" select="."/">
    <xsl:choose>
    <xsl:when test="contains($text,'&lt;')">
    <xsl:text>ASDF</xsl:text>
    <xsl:value-of select="substring-before($text,'&lt;')"/">
    <xsl:text disable-output-escaping="yes"><</xsl:text>
    <xsl:value-of<br/>select="substring-before(substring-after($text,'<'),'>')"/>
    <xsl:text disable-output-escaping="yes">></xsl:text>
    <xsl:call-template name="unescape-markup">
    <xsl:with-param name="text" select="substring-after($text,">')"/>
    </xsl:call-template>
    </xsl:when>
    <xsl:otherwise>
    <xsl:text>QWERTY</xsl:text>
    <xsl:value-of select="$text"/">
    </xsl:otherwise>
    </xsl:choose>
    </xsl:template>


    2) replace(self::node(),'<','!') is failing with this error:
    ERROR:
    com.icl.saxon.expr.XPathExceptiton: Error in expression
    replace(self::node(),'<','!'):
    Unknown system function: replace

    Note: I use ! rather than < to avoid masking a problem with fn:replace with
    a problem with <.

    I guess datamerge uses a saxon parser that doesn't contain the function.
    Further, I don't see how this would be any different, functionally, than
    fn:transform which is recognized by saxon. It would, I suspect, still fail
    to output the < or to compile.


    3) Changing the xsl:output method from xml to text has no impact. The markup
    is returned with the < as <.


    4) PTC/Arborext/Support suggested examining jdbcSource.ent in
    Editor/datamerge folder.

    While I think I understand the comments well enough to produce the markup
    required to implement parseResultSet, I am not clear on where I am supposed
    to put it.

    I tried adding the following markup:
    <parameterref name="p_parseornot" nameref="parseResultSet">
    <value>true</value>
    </parameterref>

    To a variety of locations within my .dmf. When I added it as a child of:
    <parametergroupref name="queryParameters" nameref="sqlParameters">

    in my .dmf, it changed the results. The content was returned without ANY
    markup at all.

    If parseResultSet is set to <value>false</value>, the datamerge fails with
    this error:
    <heading>Update current query</heading><record date="Thu" feb=" 24=" 14:12:08<br="/>2011" millis="1298585528" severity="info" suppress="1">
    <level>MESSAGE</level>
    <message>Load datamerge configuration file(s)</message>
    </record><record date="Thu" feb=" 24=" 14:12:08=" 2011&quot;=" millis="1298585528"&lt;br"/>severity="info" suppress="1">
    <level>MESSAGE</level>
    <message>Construct datamerge controller</message>
    </record><record date="Thu" feb=" 24=" 14:12:08=" pst=" 2011&quot;=" millis="1298585528884"&lt;br"/>severity="error" suppress="0">
    <level>ERROR</level>
    <message>com.arbortext.epic.datamerge.DataMergeException:com.arbortext.epic.datamerge.DataMergeException
    Unexpected exception : 1</message>
    <context<br/>class="com.arbortext.epic.internal.datamerge.controller.EpicDataMergeControllerWrapper"
    lineNumber="478" method="updateEpicQuery"
    systemId="EpicDataMergeControllerWrapper.java"></context>
    </record>

    If I omit it (as was the original case), markup is returned but the < are
    output as < (the original problem).

    I have asked Support:
    Is my markup correct?
    Have I put this in the right location?


    5) Current suspicions / possible wrinkles:
    a) Datamerge is not the final step of the content pipeline so IF my
    stylesheets are doing what they are supposed to do, that is somehow being
    undone in a subsequent step within the pipeline or by the pipeline itself.
    b) My XSL stylesheet is itself being transformed and passed into the
    pipeline and/or datamerge so not all of my options are possible. For
    example, if I change this line:
    <xsl:text disable-output-escaping="yes"><</xsl:text>
    To:
    <xsl:text disable-output-escaping="yes"><</xsl:text>
    I get the following error:
    FATAL ERROR:
    javax.xml.transform.TransformerException: Error reported by XML parser
    ERROR
    Failed to update query (Fields in freeform paras.(Field)
    com.arborteext.epic.datamerge.DataMergeException: Failed to construct XSL
    Stylesheet transformer.


    Once again, thanks for all your time and thoughts on the subject. I will
    keep you posted of any new developments.
    naglists1-VisitorAuthor
    1-Visitor
    February 25, 2011
    PTC has opened a SPR 2056615 against the other problem I reported: Where
    mixed data (across rows) can result in an empty cell. My original example
    was something like:

    Memory
    512 MB
    4 GB
    12

    The 12 is not returned when an Excel datasource is being queried.
    naglists1-VisitorAuthor
    1-Visitor
    February 25, 2011
    Hi Alessio,
    I think information on how to create a new Datamerge source class,
    especially if it is not a big deal, would be really valuable. Is that
    something you could share here?

    On Thu, Feb 24, 2011 at 5:30 AM, Marchetti, Alessio <
    -> wrote:

    > I apologize in advance since I did not have (nor will I have in the short
    > term) a chance to have a look at the specific use case.
    > But since one suggestion is to insert an XML parser in the chain I wanted
    > to put an additional option on the table.
    > By creating a new Datamerge source class (by inheriting the proper Java
    > interface) it would be possible to do all the "hard" work in Java (use the
    > XML parser there, upstream) and have that Java class provide the 'final' DOM
    > node back to Datamerge, downstream.
    > Creating a new Datamerge source is not a big deal, not sure if it is
    > documented though.
    >
    >
    18-Opal
    February 25, 2011
    Hi Paul-



    Sorry, I think I must have failed to explain my point of view clearly
    enough, because the XSL template you included below is actually not at
    all like the one I suggested. The template you included here is
    basically just trying to do the same kind of "<" to "<" substitution
    you've tried by other means, to no avail. If you want document elements
    to come out of your stylesheet based on the text stream that comes in,
    you will have to teach the stylesheet to parse it. As long as the markup
    you are getting from the database isn't too complex, this ought to be
    possible.



    On the bright side, I think I can help with the parseResultSet
    parameter. I was able to make it work in my DMF file. It goes inside the
    <query> element, something like this:



    <query name="employee_list" querytype="table">

    <label>Insert Full Employee List </label>

    <sourceref name="r1" nameref="MS_Access_Source">



    <parameterref name="p_statement" nameref="sqlStatement">

    <documentation>This is a source filter that retrieves data from an

    Excel database using jdbc-odbc bridge.</documentation>

    <value>select * from [Employees$]</value>

    </parameterref>

    <parameterref name="parseyes" nameref="parseResultSet">

    <documentation>This parameter requests that the markup containing

    the query results be parsed.</documentation>

    <value>true</value>

    </parameterref>

    .....



    I'll attach a copy of the full DMF file, since it's the same one I used
    for my 2009 PTC/USER presentation. With this, when I changed one of the
    cells of my Excel spreadsheet to include markup, e.g. "John", I
    got the corresponding element in my merged table (and it was an actual
    document element, not just the markup string).



    One important thing to note is that, if you turn parseResultSet on, then
    in your XSLT stylesheet, make sure you are using <xsl:copy-of> to grab
    the contents of the result set cells instead of <xsl:value-of>. The
    latter will strip out any elements and just return the concatenated text
    nodes of the context element-the exact opposite of what you are trying
    to do!



    Oh, and for future reference, replace() is an XSLT 2.0 function, so to
    make it work you would have to change your version attribute on the
    <xsl:stylesheet> element to "2.0" instead of "1.0". Then, if you're
    using Arbortext 5.4, the Saxon engine should recognize and execute that
    function. Of course, as we've already established, that's not going to
    do what you want right now, since escaped characters really isn't the
    issue here. But I thought that bit of information might come in handy
    for some other purpose someday.... 😉



    --Clay





    Clay Helberg

    Senior Consultant

    TerraXML


    1-Visitor
    February 25, 2011

    No XSLT expert myself and haven't been paying full attention to this thread(commercial rents are down, lease is up,we're moving uptown), but ifrunning a XSLT 2 processor you can useuse-character-maps="[name]" in your output declaration:


    <xsl:output omit-xml-declaration="no" method="xml" encoding="utf-8" indent="no" use-character-maps="doc-entities"/">


    <xsl:character-map name="doc-entities">
    <xsl:output-character character="§" string="§"/>
    <xsl:output-character character="–" string="–"/>
    <xsl:output-character character="—" string="—"/>
    </xsl:character-map>


    ...


    Only used it for numeric character entities so far --not general entities --but it does seem to guarantee output format for html and xml.


    - Lou


    Lou Argyres
    Continuing Education of the Bar - California
    Oakland, CA
    Lou.Argyres@ceb.ucla.edu

    In Reply to Paul Nagai:


    Hi,
    Support has said that datamerge is working as designed by returning
    (as stored in the database) as and that it is up to me to convert
    markup from < to < on output via the XSLT. That conversation is ongoing.

    BUT if one of you XSL gurus knows how to do this, maybe I can leave support
    alone.

    I have tried swapping the overall output method to text (that was a hail
    mary) and believe this is the "best" answer but it fails:
    <xsl:value-of select="translate(self::node(),'&lt;','&lt;')"&lt;br"/>disable-output-escaping="yes"/>

    The following "works" in that it returns the string to Editor with a ! where
    there should be a < so the select and the translate are working:
    <xsl:value-of select="translate(self::node(),'&lt;','!')"&lt;br"/>disable-output-escaping="yes"/>

    All attempts to use the < fail with the following:
    FATAL ERROR:
    javax.xml.transform. TransformerException: Error reported by XML parser.

    ERROR:
    Failed to update query: (my query name)
    com.arbortext.epic.datamerge.DataMergeException: Failed to construct XSL
    Stylesheet transformer.



    Any thoughts?

    (FWIW: Michael Kay can be found responding to this question (more or less),
    in his own inimitable way, "Fix the #$#% program generating the bad
    markup!")

    --
    Paul Nagai