KB Search Highlighting

TomU · ‎Aug 19, 2016

The knowledge base search automatically highlights words in the search results that match the search term. This is good. It also seems to be highlighting partial word matches. This is not good (in my opinion). In the example below, "do" and "working" should not be highlighted. Neither of these are search terms.

PeterCase · ‎Aug 22, 2016

Hello Tom, thanks for calling out this problem.

As you've found, our back end search has lemmatization enabled, and so variants of the same word are returned as search hits.

"Fuzzy" search logic helps increase search effectiveness, as users don't always describe their issue or question in the same way it appears in KB articles, and we want to do what we can to bridge that gap.

The screenshot shows that our search keyword highlighting functionality is also taking the expanded list of terms, and so these are highlighed too.

For more specific keywords, for example "route" or "section", highlighting variants like routed, routes, routing; sections, sectioned, sectioning seems like a sensible thing to do. For the more ubiquitous words like "do" and "work", the benefit is not as clear-cut.

Not wanting to lose the functionality when highlighting variants of more specific terms, the only option we'd have is to introduce exact-match highlighting for a list of words we define. I don't have a cost for this, but it's not likely to be trivial to define and maintain such a list over time, and to build the logic on the search back end, and that's without considering Asian languages.

Do you (or others reading this) have other examples of how this is causing more harm than good to the user experience with the Search Results Page, so we can better evaluate whether this belongs on the roadmap ?

Many thanks.

TomU · ‎Aug 22, 2016

I'm not sure it makes sense to highlight words when trying to also support plain language searching (as my example shows). The eye is immediately drawn to the highlighted text and when the section highlighted has nothing to do with the search term, the entire result is discarded even if the result might be relevant. Perfect example:

While this guide does indeed talk about the topic being searched for, the section that is highlighted has nothing to do with it. Seem like it would make more sense to highlight something like this (this is straight from this guide):

I would much rather see the longer, more specific words being highlighted before simple words such as "how" and "do".

dschenken · ‎Aug 22, 2016

Perhaps returning the results grouped according to the lemmatization would help. Full word results first and highlighted, then partials. For those documents containing both, initially highlight the words with the highest match.

One thing that could be done is to maintain an index of all words against which selections could be made. Bounce that against a decent dictionary to repair the gross typos and then against a frequency list to root out minor typos masquerading as legitimate words, but words not typically used.

The basic problem with word searches is lack of context. Google managed to avoid that by using Page Rank to determine which results were valuable, not just returning all the matches.

For me, unaugmented word searches are bottom of the barrel. Page Rank implicitly involved humans making judgements about worth of content, but that scheme is not available in Help results. It's why a move to Wiki is better; one can see the links that people make to judge the valuable pages and have a place to mark -why- the person felt it was important so anyone following the path gets a synopsis. It also should be used to cut the volume of redundant info, which is what text searches build their vast useless result list from.

If the user has no idea where to start, a word search can be enhanced with page rank based on links.

The main improvement to the bottom-of-barrel search would be to allow the user to specify distance. Neighbor, same sentence, same paragraph; these would limit the results where the searcher expects phrases to be contiguous. Another would be an option to force exact matches. Often I've tried to find info on a specific term and only gotten a result at #35 on the list.

I'm sure there are much better algorithms than I've suggested.

PeterCase · ‎Aug 24, 2016

Hello David,

Thank you for your suggestion about grouping and prioritising exact matches above lemmatized ones. There are different ways in which our search is prioritising results to ensure that the most relevant bubble up to the top, and so I'll check with our search team to understand if this is being done already, and if not, whether it would produce more relevant results.

You've made some other points about the back end search and the good news is, the engine has all these capabilities already.

I'll respond to them in order, so that others can also benefit,

Typos

Search detects and makes suggestions when a word is mis-typed. Of course we won't catch everything, but you will usually see a "Did you mean ...?" for commonly mistyped words.

Result Ranking

The engine is learning all the time based on daliy isage, and is "augmented" (to use your term) to prioritise results based on different factors. Similar to Google, popularity (number of views and likes) is one of the factors.

Proximity of words and literal string search

Both of these are currently possible. Please see the "Advanced Search Techniques" page for details of how to use them.