The Secret Google Quality Raters’ Handbook
Last month the Google Quality Rater’s handbook was leaked online. This is what Google deems important when judging the quality and relevance of a web page.
In spite of what many say, Google does not entirely rely on automatic computer-based algorithms in its search engine rankings.
They do employ human editors that control the quality of selected sites and they may give a boost to sites these reviewers deem especially useful.
The guidelines given to these human editors may also give search engine marketers an idea about what Google is looking for when judging quality and relevance. In short: If you want your site to rank well, it makes sense to take a look at the same guidelines.
However, Google has so far managed to keep its search engine algorithms a secret and search engine marketers have to rely on reverse engineering to find out what makes a web page succeed or –as black hat SEM goes — what you can do to trick Google into accepting spam.
A few weeks ago, however, the The Google Quality Rater’s Handbook was leaked online. Google managed to get some versions of the file off the web quickly, but by then, of course, it was too late.
(There are still PDF versions available online. Search for “Guidelines for Quality Raters”.)
First it is interesting to note that Google operates with three main types of searches and — presumably — that they try to keep a balance between the three on search engine result pages (SERPS):
The three query types are:
- Navigational (someone is looking for a specific site, e.g. Pandia)
- Informational (someone is looking for information on a specific topic of interest)
- Transactional (someone is out to find a product or service to buy)
Web pages may be relevant to one or more of these three categories for any given search query.
Five relevance categories
Google is sorting web pages into five different categories as regards relevance:
1. Vital pages are pages that are considered to be the official page related to the query.
If you search for “Pandia”, www.pandia.com/index.html will be that page, even if there are other sites out there that give you info on Pandia.
As anyone having used Google to search for a specific hotel will know, this is not as simple as it sounds. The home page of the hotel is often hard to find, and the results are filled up with various hotel affiliate portals (which are considered spam by Google, according to these guidelines).
As the handbook points out regarding “Vital” pages “the dominant interpretation [of the query] is navigational”. This means that there is one and only one correct result.
It makes little sense to strive for this status for other pages than for the pages that are evaluated to be the “official page of the query”.
2. Useful pages are pages that are highly satisfying, comprehensive, high in quality and authoritative:
“Useful pages answer the query just right; they are neither too broad nor too specific.”
You really want your pages to be in this category and the only way to achieve that is to write highly informative content, articles in the case of informational queries and pages that allows the user to find a specific product and complete the intended transaction in the case of shopping.
3. Relevant pages are less comprehensive or less authoritative that the “useful” pages.
Even if a page is deemed “Relevant”, it may also be categorized as spam, cf. the hotel affiliate sites mentioned above.
According to the Guide examples of relevant pages include:
“…a page with a brief article on the topic of the query or a less important subpage on the correct site. If a query ‘asks’ for a list, then a single item is Relevant. For example, if the query is [fudge recipes], a single fudge recipe is relevant.”
4. Not Relevant pages are pages with outdated or poor content.
5. Off Topic pages have no relevance to this particular query, although they may get a different category for other queries.
Note that Google says that if navigation to helpful content is very difficult, a rating of Off-Topic may be assigned.
There are also other categories, like
- Didnâ€™t Load
- Foreign Language
and flags like
- Pornographic content (pages that may be filtered out if the searcher’s content filter is on)
- Malicious code on pages (pages that are to be excluded from the index)
The raters are asked to ascertain whether the pages should be considered spam. There are three categories:
- Not Spam, i.e. pages that have not been designed using deceitful web design techniques.
- Maybe Spam (it certainly looks spammy, but the rater is not sure)
- Spam: Pages that are violating Google’s webmaster guidelines.
Scraped, borrowed or stolen content
Given Google’s love of original content, it comes as no surprise that the document argues strongly against scraped content, i.e. content that is automatically fetched or stolen from other sites (like the Wikipedia, DMOZ, RSS feeds and so on).
This doesn’t mean that you cannot include RSS-headlines from other sites on your own web pages. We do. But these should come in addition to your own original content, not the only content provided.
Google’s raters are told to copy snippets of text from the page and search for similar content on the web.
Google also considers so-called “thin affiliates” as spammers.
A thin affiliate is a site that gives you no original content and that only provides copied descriptions of products with affiliate links.
The guidelines gives a list of features that might help the rater determine if the site is a “true merchant”:
- a “view your shopping cart” link that stays on the same site and updates when you add items to it.
a return policy with a physical address
- a shipping charge calculator
- a “wish list” link or a link to postpone purchase of an item until later
- a way to track FedEx orders
- a user forum
- the ability to register or login
- a gift registry
- an invitation to become an affiliate of that site
Google notes that if a page offers some value in addition to its links to the merchant, it is not to be considered a thin affiliate. Such content may be price comparison functionality, product reviews, recipes, lyrics etc.
Note that the rates may look for a wide variety of spam techniques. We found the following techniques mentioned explicitly:
- Hidden text on page (if the intention is to trick the search engine)
- Sneaky redirects (where the intention is to present different content to the search engine robot and the spider)
- Keyword stuffing
- Wikipedia content plus ads
- DMOZ content plus ads
- Copied text (for instance, Wikipedia) plus ads
- Parked domain
- Pages with pay-per-click ads (PPC) and no added value
- 100% Frame (i.e. a page with two frames where one is hidden)
- Fake directories with PPC
- Fake blogs with PPC
- Fake message boards with PPC
An example from the Handbook
Here is one of several examples Google provides for its raters:
A page that pops up for the query Nicole Kidman is:
Vital if it is Nicole Kidman’s official page. But Google adds that the rater should be ware that other sites may claim to be official.
Useful pages are pages that are comprehensive resources for Nicole Kidman: “a comprehensive resource would include her biography, filmography, pictures, etc. The page may even be a personal fan page.”
Relevant pages are for instance “news articles of at least one paragraph with timely and informative material about Nicole Kidman”.
Not Relevant are pages containing little information about Nicole Kidman. (This would apply to the page you are reading now. It is highly relevant to the query Google Quality Raters Handbook, but even if it contains the term Nicole Kidman, there is no useful info on her here).
Under the Off-Topic category, Google notes that well-known actresses and personalities are often exploited for porn and spam. It gives nicolekidman.org as an example.