Thursday 20 November 2008

How MOSS Search Determines Relevance

Relevance is a measure of how well the results returned by a search system meet the user’s needs. With each search request, the system should be directing the user to the most relevant items (from a corpus of potentially millions of items!).
SharePoint’s relevance settings affect how rankings for items are calculated, which affects the order in which search results appear in a search results list. Microsoft focused a lot on improving relevance for search results in MOSS 2007, which is tuned for searching enterprise content and Line of Business (LOB) application data.
The following are used to calculate relevance:

Title and filename. Is the search term in the document’s title? Its
filename?
Metadata. Is the search term in the metadata properties (Site
Columns)?
Search Term Density. What’s the density of that search term within
the document (for example, 20 mentions in a one-page document
versus 20 mentions in a 100-page document)?
Keywords. Words or phrases that you identify as significant to your
organization. They provide a way to provide additional information
(and recommended links) on the initial results page that would not
otherwise be displayed in the search results for queries containing
that keyword. Two pieces of information can be displayed for a keyword—
a definition of the term and the best bets, which are a list of
links identified as being very relevant for that term.
Best Bets. Items tagged manually as best bets show up first in a
search results list, as just described.
Security. SharePoint excludes results that the user does not have
permission to see. This is called security trimming.
Hyperlink Click Distance. SharePoint determines the number of
links between a URL and any of the authoritative sites that a search
administrator specifies. The more links that the search needs to
travel from an authoritative site to the content item, the lower the
relevance score.
Anchor Text. This is the text that is included with a hyperlink to
describe the target content of that hyperlink. When the search
crawls an item, this anchor text is included in the index for that content,
but only for ranking, not for results matching. For example, if
the anchor text matches but the content in the corresponding item
doesn’t, the link is not included in results.
URL Depth. Shorter URLs are placed higher in the ranking.
SharePoint determines “depth” by looking at the number of slashes
in the URL.
URL Text Matching. SharePoint looks for the presence of the text in
the URL itself.
Title Extraction. For Office documents, SharePoint uses the title
property of the document to help return highly relevant content,
provided it’s not the default value (Slide1, Document1, and so on).
Result Collapsing. SharePoint search combines similar results. This
prevents things such as users getting the same document as the first
20 hits.
Language. Users typically want content in their own language.

SharePoint looks at the browser language and defaults to searches in
that language. English is also ranked highly regardless of browser
language.
-------- Have A Nice Time !!! -------------