Support Questions Ask a Question

There are no questions. Be the first to ask a question!

Overview ¶ 

Just like any other website, your Dozuki site has a Search Function that allows your users to search and filter through results based on specific terms, words or phrases.

Block Image

Results can be filtered by page type, and helpful information is included within the results to help you narrow your search.

Other tools and tricks below will help you take full advantage of your site's Search capabilities and will help you understand the logic behind it as well.

Using Boolean Search Operators ¶ 

When searching text fields, you can use the Boolean operators + (AND), | (OR), and - (NOT).

Search for Multiple Terms ¶ 

Searching Multiple Terms in One Search:

If you separate search terms with `+` or a space, the search matches documents that contain all of the specified search terms—they are `AND`ed together.

Searching for Either/Or Multiple Terms in One Search:

You can use the `|` (`OR`) operator to separate terms when you want to match documents that contain either the preceding term(s) or the following term(s).

Search with Excluding Terms ¶ 

To exclude documents that contain a particular term from the search results, prefix the term with the `-` (`NOT`) operator. For example, to search for all of the documents that don't contain the term star in the default search field, you would specify: `-star`. The `NOT` operator only applies to individual terms. Searching for `-star+wars` retrieves all documents that do not contain the term `star`, but do contain the term `wars`.

Examples ¶ 

  • `red|white|blue` matches documents that contain either `red`, `white`, or `blue`.
  • `"evans, chris"|"Garity, Troy"` matches documents that contain either the phrase "evans, chris" or the phrase "Garity, Troy".
  • `-star+war|world` matches documents that do not contain "star", but do contain either "war" or "world".

Using Wildcards ¶ 

You can use the `*` (asterisk) wildcard operator to perform prefix matching. The `*` operator only applies to individual terms. When you append the `*` operator to a string, the string is treated as a prefix. The search matches documents that contain the prefix followed by zero or more characters. Prefix searches are expanded to a maximum of 2,000 indexed terms. If more than 2,000 terms match the prefix, the search results will not include all possible matches.

For example, the following Boolean query searches the title field for the prefix "star": `star*`. If you perform this search against movie data, the response might contain movies such as "Stargate", "Dark Star", and "Starsky & Hutch".

Searching for Phrases ¶ 

You can enclose a phrase in double quotes to match the complete phrase rather than the individual terms in the phrase. For example, the query `"iphone battery"` will match text like "Remove the _iPhone battery_" but not text like "remove the _battery_ from the _iPhone_".

CloudSearch Ranking ¶ 

Our search results are ranked according to a formula applied to documents that match a user's search terms. In order to show up in the results, a document must contain all of the words (or phrases) specified. An exception is that you can use the A | B modifier to specify that the document may contain either A or B.

Some reasons it may not look like a document contains all of the words:

  • For many document types, we index document metadata as well as the text that you see in any particular view of the document (e.g., we may include invisible text in PDFs or prerequisite step text in a guide).
  • We also index translations together in the same document, so if one language contains valid words from another language, it's possible for a search in one language to match a document in another language. We do this so that we can show results in both the current view language and the site default language in the same place.
  • Our search service breaks up long sequences of numbers and letters at the boundaries between the letters and the numbers.
  • Our search service removes common English suffixes like "es" and "s" from words in both the query and the document. This practice is called "stemming".

Once a set of documents matching the search terms has been collected, they're ranked with a formula with a few components:

  • The weighted text relevance. This is a score computed by the search service that's supposed to be larger for documents that are better matches for the query. This is the foundation for everything else; our ranking formula modifies the text relevance to get the final score, and it's unlikely that we'll modify a small text relevance score to be larger than an initially-large text relevance score. The score is based off of a number of criteria:
    • The number of times each word in the query occurs in the document: more is better.
    • Where a word in the document matches: we break documents up into fields, and a match in the "title" field is much more valuable than a match in the "content" field, which is a catch-all for all of the document text that doesn't go into other fields. The rough ordering by weight for the documents fields is:
      • Exact title (a character-for-character match of the entire title)
      • Title
      • Identifiers (product codes, apple part numbers, etc.)
      • Description (summaries, introductions, etc.)
      • Content (everything else)
    • How large the document is: bigger documents are worse matches because they're more likely to contain any word.
    • How "valuable" a query word is: words that occur across many documents are less valuable than words that occur in just a few documents.
  • The document type. We do some very coarse adjusting based on the document type, and this can have a significant impact on results. We apply a large penalty to products, teardowns, and answers posts, and a small penalty to guides (relative to wikis). So all else being equal, category pages and other wikis are likely to show up at the top of search results, then guides, and then everything else. One can filter by doctype, though, in which case these adjustments don't matter very much.
  • The document popularity. We compute a popularity score for documents based mostly on view statistics, and these can have a pretty big impact on the final results. We adjust the ranking by popularity on a curve, so differences between documents with low popularity don't matter very much, and differences between documents that are near the maximum popularity also don't matter much. Between those two extremes, though, higher popularity generally means a higher rank.
  • Title length. Because the namespace for titles is constrained (you can't generally give different documents different titles), we value shorter titles; they tend to be applied to more important documents. Consequently, documents with shorter titles get a small boost over documents with longer titles.
  • Content length. Documents with lots of content tend to be our best documents (i.e., lots of content means a document that lots of solid work has gone into), but the text relevance score punishes longer documents, so we balance that out with a small bonus to longer documents.

Our ranking formula basically starts with the text relevance, boosts or decreases it according to popularity, title length, and content length, then applies a coarse weight by document type.