cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
Showing results for 
Search instead for 
Did you mean: 

mysterious space in HTML output after XSL conversion

c.strickland
1-Newbie

mysterious space in HTML output after XSL conversion

I'm having a really strange problem after converting an XML document to
HTML using XSL.

Whenever I select certain XML title elements in the stylesheet (such as
a procedure title), a space appears before the title in the HTML output.
This does not happen with all titles in our XML document. (Process
titles do not have a leading space in HTML output.) In addition, none of
the titles have a leading space in the XML document.

For example, this:



  • Yields the following text with a leading space in output:

    * Procedure With Subprocedres Title

    I wrote some JavaScript to get rid of the leading space because I could
    not get rid of it using any xslt/xpath functions, such as
    normalize-space():

    function trimLeadingSpace (str) {
    var leadingSpace= / /
    result = str.replace(leadingSpace, ")
    }



    However, the JavaScript did not work until I copied the leading space in
    the HTML text and pasted it in the applicable JavaScript code:

    var leadingSpace= / /

    Then the javascript worked and the leading space went away from the
    title.

    What's going on? Is the leading space actually some sort of character
    that I can't see? The XML I'm transforming is the product of a B2B XML
    to XML conversion using an XSL stylsheet. I wonder if some sort of
    character was dumped before title text in certain elements during the
    conversion. If this is the case, how can I avoid that problem in XML to
    XML conversions? If not, what else might be the cause?

    Thank you!

    Chris Strickland, Home Depot Supply, Inc.
    8 REPLIES 8

    Hi Chris--

    Just a guess, but it may be a newline rather than a space. You might get this if you have the indent="yes" attribute set on the <xsl:output> element in your stylesheet.

    If you have that on, try turning it off and see if that helps. You can also check your generated HTML file: open it in a text editor (or open in in a browser and right-click and select "View Source"). If you see something like this:


  • My Title


  • then that may be the problem.

    --Clay

    Thanks for the advice.

    I tried turning off indent="yes", but unfortunately that did not help. When I look at the HTML source code, it looks like this:

  • Â Procedure With Subprocedures Title

  • Hi Chris--

    Looking at the string in Emacs' hex mode, the space there is 0xa0; better known as . It is probably there in the original XML before the transformation, and it's not recognized as ignorable whitespace w.r.t. normalize-space(). To get rid of it, you can use the translate() function, something like this:

  • <xsl:value-of select="translate(node()/_:Title,'','')"/">


  • See if that works better for you.

    --Clay

    Thank you so much! That solved the problem!

    How can I get rid of that character in the source XML? And what editor did you use to see the string?

    Thanks again!

    Chris

    Hi Chris--

    I used emacs, with hex mode that shows the characters in the file as hexadecimal strings. See http://www.gnu.org/software/emacs/. The "hexl" major mode is standard with emacs, no additional packages are needed. Note that emacs is not for the faint of heart, though--it's a powerful editor, but it has a famously steep learning curve.

    As for getting rid of the character in the source XML, if they are static files, you should be able to just edit them out in the usual way. (In Epic, you can use Find entity to search for "nbsp", and replace each instance. Or you can just paste the character in the normal Find/Replace dialog.) You could also write a pretty simple Perl script to do the replacement.

    If the XML is coming from a CMS or database, you'll have to figure out a way to modify the contents of the system, or at least to modify the output pipeline to strip out the 's before serving the content.

    --Clay

    Good call, Clay. However, I suspect it might be safer to convert the NBSP to a plain space, then use normalize-space (which only recognizes space, tab, CR and LF, but not NBSP, as whitespace) to get rid of extraneous spaces. That way, if you end up with NBSPs separating words, you won't accidentally join them together.

  • <xsl:value-of select="normalize-space(translate(node()/_:Title,''," '))&quot;="/>


  • Brandon Ibach
    Lockheed Martin Space Systems
    Cape Canaveral, FL
    321-476-7051

    Good point. I was assuming that the initial space was the only instance of --but you know what happens when you assume....

    Chris,
    If you're more the "faint of heart" type when it comes to your text editor, you may find TextPad to be a good choice. It has a hex mode, though it's read-only. I find TextPad to be a pretty good, powerful editor for relatively quick tasks. I do all of my heavy lifting in vim, of course, but that's got at least as bad, more likely worse, of a learning curve as emacs.

    Brandon Ibach
    Lockheed Martin Space Systems
    Cape Canaveral, FL
    321-476-7051

    Announcements