Search

v0.2

09/05/2023 by Robert Sprankle

Printed Copies Are Uncontrolled

Wiki ID: 11246

Author: Steve Miller

Release: v0.2 [Minor]

Revision ID: 49497

Revision Date: 08-31-23

Author: Amanda Radakovich (and 4 other contributors)

Support Questions

Ask a Question

2 Answers

Search by Guide ID

Just like any other website, your Dozuki site has a Search Function that allows your users to search and filter through results based on specific terms, words, or phrases.

Results can be filtered by page type, and helpful information is included within the results to help you narrow your search. The default search page type can be changed using the Default Search Filter.

(Note: Private guides do not appear in search results for standard users. Some sites restrict access of private guides by author; private guides on these sites will not appear in search results for anyone except the author).

Other tools and tricks (found below) help you take full advantage of your site's Search capabilities and enable you to understand the logic behind it.

When searching text fields, you can use the Boolean operators + (AND), | (OR), and - (NOT).

Searching Multiple Terms in One Search:

If you separate search terms with `+` or a space, the search matches documents that contain all of the specified search terms. They are `AND`ed together.

Searching for Either/Or Multiple Terms in One Search:

You can use the `|` (`OR`) operator to separate terms when you want to match documents that contain either the preceding term(s) or the following term(s).

To exclude documents that contain a particular term from the search results, prefix the term with the `-` (`NOT`) operator. For example, to search for all of the documents that do not contain the term star in the default search field, you would specify: '-star.' The `NOT` operator only applies to individual terms. Searching for `-star+wars` retrieves all documents that do not contain the term `star,` but they do contain the term `wars.`

`red|white|blue` matches documents that contain either `red`, `white`, or `blue.`
`"evans, chris"|"Garity, Troy"` matches documents that contain either the phrase "evans, chris" or the phrase "Garity, Troy."
`-star+war|world` matches documents that do not contain "star," but they do contain either "war" or "world."

You can use the `*` (asterisk) wildcard operator to perform prefix matching. The `*` operator only applies to individual terms. When you append the `*` operator to a string, the string is treated as a prefix. The search matches documents that contain the prefix followed by zero or more characters. Prefix searches are expanded to a maximum of 2,000 indexed terms. If more than 2,000 terms match the prefix, the search results will not include all the possible matches.

For example, the following Boolean query searches the title field for the prefix "star": `star*.` If you perform this search against movie data, the response might contain movies such as Stargate, Dark Star, and Starsky & Hutch.

You can enclose a phrase in double quotes to match the complete phrase rather than the individual terms in the phrase. For example, the query `"iphone battery"` will match text like "Remove the _iPhone battery_" but not text like "remove the _battery_ from the _iPhone_."

Our search results are ranked according to a formula applied to documents that match a user's search terms. In order to show up in the results, a document must contain all of the words (or phrases) specified. An exception is that you can use the A | B modifier to specify that the document may contain either A or B.

Common reasons why it may not look like a document contains all of the words:

For many document types, we index document metadata as well as the text that you see in any particular view of the document (e.g., we may include invisible text in PDFs or prerequisite step text in a guide).
We also index translations together in the same document, so if one language contains valid words from another language, it is possible for a search in one language to match a document in another language. We do this so we can show results in both the current view language and the site default language in the same place.
Our search service breaks up long sequences of numbers and letters at the boundaries between the letters and the numbers.
Our search service removes common English suffixes like "es" and "s" from words in both the query and the document. This practice is called "stemming."

Once a set of documents matching the search terms has been collected, they are ranked with a formula with a few components:

The weighted text relevance. This is a score computed by the search service that is supposed to be larger for documents that are better matches for the query. This is the foundation for everything else; our ranking formula modifies the text relevance to obtain the final score, and it is unlikely that we will modify a small text relevance score to be larger than an initially-large text relevance score. The score is based off of a number of criteria:
- The number of times each word in the query occurs in the document: more is better.
- Where a word in the document matches: we break documents up into fields, and a match in the "title" field is much more valuable than a match in the "content" field, which is a catch-all for all of the document text that does not go into other fields. The rough ordering by weight for the documents fields is:
  - Exact title (a character-for-character match of the entire title)
  - Title
  - Identifiers (product codes, apple part numbers, etc.)
  - Description (summaries, introductions, etc.)
  - Content (everything else)
- How large the document is: bigger documents are worse matches because they are more likely to contain any word.
- How "valuable" a query word is: words that occur across many documents are less valuable than words that occur in just a few documents.
The document type. We do some very coarse adjusting based on the document type, and this can have a significant impact on results. We apply a large penalty to products, teardowns, and answers posts, and a small penalty to guides (relative to wikis). So all else being equal, category pages and other wikis are likely to show up at the top of search results, then guides, and then everything else. One can filter by doctype, though, in which case these adjustments do not matter very much.
The document popularity. We compute a popularity score for documents based mostly on view statistics, and these can have a pretty big impact on the final results. We adjust the ranking by popularity on a curve. Thus, differences between documents with low popularity do not matter very much, and differences between documents that are near the maximum popularity also do not matter much. Between those two extremes, however, higher popularity generally means a higher rank.
Title length. Because the namespace for titles is constrained, in general you cannot give different documents different titles. We value shorter titles; they tend to be applied to more important documents. Consequently, documents with shorter titles receive a small boost over documents with longer titles.
Content length. Content heavy documents tend to be our best documents. Content heavy means a document with an abundance of solid work put into it. However, the text relevance score punishes longer documents, so we balance that out with a small bonus to longer documents.

Our ranking formula basically starts with the text relevance, boosts or decreases it according to popularity, title length, and content length, then applies a coarse weight by document type.

Dozuki

Release Notes

Search

Support Questions

Search by Guide ID

Overview

Using Boolean Search Operators

Search for Multiple Terms

Search with Excluding Terms

Examples

Using Wildcards

Searching for Phrases

CloudSearch Ranking