Google Ranking Factors & Quality Rating Criteria

Subscribe.

I think a special mention should go to Bill Slawski who does a great job of analysing SEO related patents (who I might owe joint writing credit for this article). I did not want to send people directly to the patent application site, and Bill deserves the link(s) I think. Follow Bill on Twitter here for SEO patent analysis.

I hope this is not a typical ‘top 200 Google ranking factors’ post:

Bill Slawski: "I've been quoted and cited a bit in this post, which is worth spending some time going through. It's one of the most detailed lists of ranking signals that I've seen."

Many seo (myself included) have been focused on site quality the last few years (especially after 2011). I hope this post lays down why some seo think a quality rating factor is at play in everything Google does to rank its SERPs.

I personally think website history plays a big role in ranking, for instance – an example of not something any seo can really do anything about for ages.

I also think a lot of ‘ranking factors’ we are interested in are second-order effects – ie. not something we can single variant test or analyse accurately.

It is clear looking at the patents that Google is spending a lot of time in a lot of areas trying to work out ‘quality’.

Not everything in the patents, of course, makes it to the end Google ‘algorithm’. But this IS what some Google engineers are thinking about – and THAT is IMPORTANT. Then (on top of algorithmic filtering) you have the Google quality raters working out how good these algorithms are getting which leads to more improvement over time.

When it comes to ranking factors, I usually listen to:

  • Google
  • Google spokespeople
  • Blackhat SEO (These are the people who told us all how Google worked back in the day!)
  • White Hat SEO

…in that order, too.

Read on if you want to understand some of the challenges Google are trying to solve and the solutions they have come up with. If you are a student of seo, this page might help you (it’s my notes on the subject, and take heed that these would be filed squarely under ‘speculation‘).

Table Of Contents

[hoboTOC use=”h2″]

Google Ranking Signals

Google crawls the web and ranks pages. Where a page ranks in Google is down to how Google rates the page. There are hundreds of ranking signals SEO think we know about from various sources.

Ranking based on a ‘quality metric

Another problem we were having was an issue with quality and this was particularly bad (we think of it as around 2008 2009 to 2011) we were getting lots of complaints about low-quality content and they were right. We were seeing the same low-quality thing but our relevance metrics kept going up and that’s because the low-quality pages can be very relevant this is basically the definition of a content form in our in our vision of the world so we thought we were doing great our numbers were saying we were doing great and we were delivering a terrible user experience and turned out we weren’t measuring what we needed to so what we ended up doing was defining an explicit quality metric which got directly at the issue of quality it’s not the same as relevance …. and it enabled us to develop quality related signals separate from relevant signals and really improve them independently so when the metrics missed something what ranking engineers need to do is fix the rating guidelines… or develop new metrics. SMX West 2016 – How Google Works: A Google Ranking Engineer’s Story (VIDEO)

Ranking based on a ‘classifier

And we actually came up with a classifier to say, okay, IRS or Wikipedia or New York Times is over on this side, and the low-quality sites are over on this side. And you can really see mathematical reasons…Matt Cutts 2011 (source: Bill Slawski: Google’s Quality Score Patent: The Birth of Panda?

Ranking based on a ‘site quality score

The score is determined from quantities indicating user actions of seeking out and preferring particular sites and the resources found in particular sites. *****A site quality score for a particular site**** can be determined by computing a ratio of a numerator that represents user interest in the site as reflected in user queries directed to the site and a denominator that represents user interest in the resources found in the site as responses to queries of all kinds The site quality score for a site can be used as a signal to rank resources, or to rank search results that identify resources, that are found in one site relative to resources found in another site. How Google May Calculate Site Quality Scores (from Navneet Panda)

QUOTE: “Don’t think of Panda as a ‘penalty‘”

“It’s part of core ranking algorithm & if you don’t improve the site, a move won’t help you untangle yourself. You need to improve the quality of the site, moving domains will not improve your ranking (context is panda)”  Gary Illyes – Twitter

QUOTE: Panda ‘measures the quality of a site pretty much by looking at the vast majority of the pages

It measures the quality of a site pretty much by looking at the vast majority of the pages at least. But essentially allows us to take quality of the whole site into account when ranking pages from that particular site and adjust the ranking accordingly for the pages. So essentially, if you want a blunt answer, it will not devalue, it will actually demote. Basically, we figured that site is trying to game our systems, and unfortunately, successfully. So we will adjust the rank. We will push the site back just to make sure that it’s not working anymore.  Gary Illyes – Search Engine Land

Ranking based on a content-based site quality score ‘determining a predicted site quality score for the new site from the aggregate site quality score

“In general, one innovative aspect of the subject matter described in this specification can be embodied in methods that include the actions of obtaining baseline site quality scores for a plurality of previously-stored sites; generating a phrase model for a plurality of sites including the plurality of previously-scored sites, wherein the phrase model defines a mapping from phrase-specific relative frequency measures to phrase-specific baseline site quality scores; for a new site, the new site not being one of the plurality of previously-scored sites, obtaining a relative frequency measure for each of a plurality of phrases in the new site; determining an aggregate site quality score for the new site from the phrase model using the relative frequency measures of the plurality of phrases in the new site; and determining a predicted site quality score for the new site from the aggregate site quality score.” Using Ngram Phrase Models to Generate Site Quality Scores: Bill Slawski

Rankings based on ‘a relevance of the group of blog documents to the search query and a quality of the group of blog documents’

A blog search engine may receive a search query. The blog search engine may determine scores for a group of blog documents in response to the search query, where the scores are based on a relevance of the group of blog documents to the search query and a quality of the group of blog documents. The blog search engine may also provide information regarding the group of blog documents based on the determined scores. Positive and Negative Quality Ranking Factors from Google’s Blog Search (Patent Application) – BILL SALWSKI

Ranking based on a ‘duration metric’

The average duration metric for the particular group of resources can be a statistical measure computed from a data set of measurements of a length of time that elapses between a time that a given user clicks on a search result included in a search results web page that identifies a resource in the particular group of resources and a time that the given user navigates back to the search results web page. …Thus, the user experience can be improved because search results higher in the presentation order will better match the user’s informational needs. High Quality Search Results based on Repeat Clicks and Visit Duration

Ranking based on a ‘duration performance score

The duration performance scores can be used in scoring resources and websites for search operations. The search operations may include scoring resources for search results, prioritizing the indexing of websites, suggesting resources or websites, protecting particular resources or websites from demotions, precluding particular resources or websites from promotions, or other appropriate search operations. A Panda Patent on Website and Category Visit Durations

Quote: ‘Training models on when someone clicks on a page and stays on that page’

So when search was invented, like when Google was invented many years ago, they wrote heuristics that had figure out what the relationship between a search and the best page for that search was. And those heuristics worked pretty well and continue to work pretty well. But Google is now integrating machine learning into that process. So then training models on when someone clicks on a page and stays on that page, when they go back or when they and trying to figure out exactly on that relationshipGoogle Search Uses Click Data For Rankings?

Quote: ‘Quality signals that are site-wide’ applying a ‘current score’ to ‘all webpages’

The interesting part is that Google did say they do use overall sitewide signals for new pages to rank. John Mueller said this back earlier this year, “there are some things where we do look at a website overall though.”

John did also indicate that a site-wide score or a score on ‘parts’ of a site (created by Panda analysis) was a floating point that can be improved.

Quote: ‘there’s not just this one site-wide quality score that we look at

I think there is probably a misunderstanding that there’s this one site-wide number that Google keeps for all websites and that’s not the case.  We look at lots of different factors and there’s not just this one site-wide quality score that we look at. So we try to look at a variety of different signals that come together, some of them are per page, some of them are more per site, but it’s not the case where there’s one number and it comes from these five pages on your website.

Link Quality Scores

Rank assigned to ‘a document is calculated from the ranks of documents citing it’ – ‘Google Pagerank’

DYK that after 18 years we’re still using* PageRank (and 100s of other signals) in ranking? Gary Illyes from Google – Search Engine Roundtable

We can only presume Google still uses Pagerank (or something like it) in its ordering of web pages.

A method assigns importance ranks to nodes in a linked database, such as any database of documents containing citations, the world wide web or any other hypermedia database. The rank assigned to a document is calculated from the ranks of documents citing it. In addition, the rank of a document is calculated from a constant representing the probability that a browser through the database will randomly jump to the document. The method is particularly useful in enhancing the performance of search engine results for hypermedia databases, such as the world wide web, whose documents have a large variation in quality. The Original PageRank Patent Application – Bill Slawski

A high pagerank (a signal usually calculated for regular web pages) is an indicator of high quality and, thus, can be applied to blog documents as a positive indication of the quality of the blog documents.  Positive and Negative Quality Ranking Factors from Google’s Blog Search (Patent Application) – BILL SALWSKI

*Google evidently does not throw the baby out with the bathwater. If Google still uses Pagerank, then perhaps they still use tons of other legacy methods of ranking websites.

A ‘measure of quality’ based on ‘the number’ of links:

A system can determine a measure of quality for a particular web resource based on the number of other resources that link to the particular web resource and the amount of traffic the resource receives. For example, a ranking process may rank a first web page that has a large number of other web pages that link to the first web page higher than a web page having a smaller number of linking web pages. Did the Groundhog Update Just Take Place at Google? BILL SLAWSKI

A ‘measure of quality’ based on ‘traffic received by use of those links’

However, some a resource may be linked to by a large number of other resources, while receiving little traffic from the links. For example, an entity may attempt to game the ranking process by including a link to the resource on another web page. This large number of links can skew the ranking of the resources. To prevent such skew, the system can evaluate the “mismatch” between the number of linking resources and the traffic generated to the resource from the linking resources. If a resource is linked to by a number of resources that is disproportionate with respect to the traffic received by use of those links, that resource may be demoted in the ranking processDid the Groundhog Update Just Take Place at Google? BILL SLAWSKI

A ‘measure of quality’ based on link ‘selection quality score’

The selection quality score may be higher for a selection that results in a long dwell time (e.g., greater than a threshold time period) than the selection quality score for a selection that results in a short dwell time (e.g., less than a threshold time period). As automatically generated link selections are often of a short duration, considering the dwell time in determining the seed score can account for these false link selections. Did the Groundhog Update Just Take Place at Google? BILL SLAWSKI

Ranking based on the ‘extent to which a document is selected’

A system may determine an extent to which a document is selected when the document is included in a set of search results, generate a score for the document based, at least in part, on the extent to which the document is selected when the document is included in a set of search results; and rank the document with regard to at least one other document based, at least in part, on the score. The Original Historical Data Patent Filing and its Children – Bill Slawski

Ranking based on a when a ‘document is clicked more than other blog documents when the blog document appears in result sets

For example, if a certain blog document is clicked more than other blog documents when the blog document appears in result sets, this may be an indication that the blog document is popular and, thus, a positive indicator of the quality of the blog document.  Positive and Negative Quality Ranking Factors from Google’s Blog Search (Patent Application) – BILL SALWSKI

Ranking based on ‘a decrease in a rate or quantity of new links that point to the document over time

A method may include receiving a document and an initial score for the document; determining that there has been a decrease in a rate or quantity of new links that point to the document over time; classifying the document as stale in response to the determining; decreasing the initial score for the document, resulting in an updated score; and ranking the document with regard to at least one other document based, at least in part, on the score. The Original Historical Data Patent Filing and its Children – Bill Slawski

Ranking based on ‘on the time-varying behaviour of the links pointing to the document

A system may determine time-varying behavior of links pointing to a document, generate a score for the document based, at least in part, on the time-varying behaviour of the links pointing to the document, and rank the document with regard to at least one other document based, at least in part, on the score. The Original Historical Data Patent Filing and its Children – Bill Slawski

Spam identified by the ‘sudden growth in the number of apparently independent peers (e.g., unrelated websites)’

A sudden growth in the number of apparently independent peers (e.g., unrelated web sites), incoming and/or outgoing, with large number of links to individual pages may indicate a potentially synthetic web graph, e.g., which in turn may signal an attempt to spam the search engine. This indication may be strengthened if the growth corresponds to anchortext that is unusually coherent or discordant. This information can be used to demote the impact of such links– e.g., in a link-based ranking system such as the one proposed by Brin et al.­ either as a binary decision item (e.g., demote score by fixed amount) or a multiplicative factor. The Original Historical Data Patent Filing and its Children – Bill Slawski

Ranking based on ‘Unique words, bigrams, phrases in anchor text

In one embodiment, the link or web graphs and their behavior over time may be monitored and used for scoring, spam detection or other purposes by a search engine. Naturally developed web graphs typically involve independent decisions. Synthetically generated web graphs-usually indicative of an intent to spam a search engine are based on coordinated decisions; as such, the profile of growth in anchor words/bigrams/phrases is likely to be relatively spiky in this instance. One reason for such spikiness may be the addition of a large number of identical anchors from many places; another possibility may be addition of deliberately different anchors from a lot of places. With this in mind, in one embodiment of the invention, this information could be monitored and factored into scoring a document by capping the impact of suspect anchors associated with links thereto on the associated document score (a binary decision). In another embodiment, a continuous scale for the likelihood of synthetic generation is used, and a multiplicative factor to scale the score for the document is derived. The Original Historical Data Patent Filing and its Children – Bill Slawski

Ranking based on ‘changes over time in anchor text

In one embodiment of the invention, the time-varying behavior of anchortext (e.g., the text in which a hyperlink is embedded, typically underlined or otherwise highlighted in a document) associated with a document may be used to score the document. For example, in one embodiment, changes over time in anchortext corresponding to inlinks to a document may be used as an indication that there has been update or even change of focus in the document; a relevancy score may take this change(s) into account. The Original Historical Data Patent Filing and its Children – Bill Slawski

Ranking based on ‘bookmarks‘ or ‘favourites

In one embodiment of the invention, data maintained or generated by a user may be monitored over time and used to score one or more documents by a search engine. For example, in one embodiment of the invention where the search engine, either directly or indirectly, has access to the “bookmarks” or “favorites” lists maintained by users’ browser programs, the search engine may monitor upward and downward trends, rates thereof, etc., that a document (or more specifically, a path thereto) is added or deleted to, or accessed through, such lists. The Original Historical Data Patent Filing and its Children – Bill Slawski

QUOTE: ‘Penguin doesn’t… penalize anymore

I think with this release of penguin we did achieve something really nice because it doesn’t like traditionally web some algorithms used to demote sides even entire entire sites that’s not the case anymore with penguin this penguin managed to or can in fact discard links that are bad that’s the one of the nicest changes in this penguin it can be way more granular than the previous releases as I said previous releases usually demoted whole sites while this one can even go to a page level and discount the link but they won’t be penalized that’s the that’s the measure thing penguin doesn’t work or doesn’t penalize anymore doesn’t demote it will just discard the incoming spam toward the side and it will just ignore the spam and that’s it no penalty no demotion and it works in real time so thing major signal that thing when is looking at these links basically if there are many or if there are bad links and other kinds of signals coming towards the side then it will just discard them and that’s what they need to know they can still see those links in search console and they can decide whether they want to disable or remove Gary Illyes Nov 2016

QUOTE: Re: Negative SEO or ‘toxic link campaigns

we haven’t seen a single case a single one where those toxic link campaigns work” Gary Illyes Nov 2016

Quote: The ‘manual actions team… can look at the labels on the on the links or a site gets. Basically, we have tons of link labels

“the manual actions team… can look at the labels on the on the links or a site gets. Basically, we have tons of link labels; for example, it’s a footer link, basically, that has a lot lower value than an in-content link. Then another label would be a Penguin real-time label. If they see that most of the links are Penguin real-time labelled, then they might actually take a deeper look and see what the content owner is trying to do.”

“So, if you think about it, there are tons of different kinds of links on the internet. There are footer links, for example. There are Penguinized links, and all of these kinds of links have certain labels internally attached to them, basically for our own information. And if the manual actions team is reviewing a site for whatever reason, and they see that most of links are labeled as Penguin real-time affected, then they might decide to take a much deeper look on the site and see what’s up with those links and what could be the reason those links exist — and then maybe apply a manual action on the site because of the links.”

“So disavow is again, basically, just a label internally. It’s applied on the links and anchors. And then you can see that, as well. Basically, you could have like a link from, I don’t know, WhiteHouse.gov, and it has labels Penguin RT, footer and disavow. And then they would see that — they would know that someone or the webmaster or content owner is actively tackling those links.” Gary Illyes Oct 2016

Other Potential Ranking Considerations

Google was built on surfacing relevant content, and that underlying requirement is still there for Google to find and understand new pages. But it is USERS who finally determine where you rank. Of course, some of these signals can be manipulated.

To get an understanding of what Google is trying to achieve, you’ll need to know what this quality score effects.

Ranking based on ‘query deserves freshness’

Mr. Singhal introduced the freshness problem, explaining that simply changing formulas to display more new pages results in lower-quality searches much of the time. He then unveiled his team’s solution: a mathematical model that tries to determine when users want new information and when they don’t. (And yes, like all Google initiatives, it had a name: QDF, for “query deserves freshness.”) New York Times h/t Search Engine Land

Ranking based on the ‘freshness of a first document’

A system determines a freshness of a first document. The system determines whether a freshness attribute is associated with the first document. The system identifies, based on the determination, a set of second documents that each contain a link to the first document. The system assigns a freshness score to the first document based on a freshness attribute associated with each document of the set of second documents or the freshness attribute associated with the first document. The Original Historical Data Patent Filing and its Children – Bill Slawski

Ranking based on ‘document inception date

A system may determine a document inception date associated with a document, generate a score for the document based, at least in part, on the document inception date, and rank the document with regard to at least one other document based, at least in part, on the score. The Original Historical Data Patent Filing and its Children – Bill Slawski

Ranking based on ‘a measure of how a content of a document changes over time

A system may determine a measure of how a content of a document changes over time, generate a score for the document based, at least in part, on the measure of how the content of the document changes over time, and rank the document with regard to at least one other document based, at least in part, on the score. The Original Historical Data Patent Filing and its Children – Bill Slawski

Ranking based on ‘a significant change over time in the set of topics associated with a document‘ (or website)

In one embodiment of the invention, topic extraction (e.g., through categorization, URL analysis, content analysis, clustering, summarization, set of unique low frequency words, or some other means of topic extraction) may be performed and the topic of a document monitored over time and used for scoring purposes. In one embodiment, if there is a significant change over time in the set of topics associated with a document, the search engine may consider this as an indication that link-based ranking; anchortext, or other external to the document but associated therewith and present prior to such change should be discounted. The Original Historical Data Patent Filing and its Children – Bill Slawski

Rankings (for blogs) based on a ‘quality score

A method comprising: identifying at least one of positive indicators of a quality of a blog document or negative indicators of the quality of the blog document, the identified at least one of positive indicators or negative indicators including an indicator specific to blog documents; determining a quality score for the blog document based on the identified at least one of positive indicators or negative indicators;

  1. receiving a search query;
  2. determining a score for the blog document based on a relevance of the blog document to the search query;
  3. adjusting the score of the blog document based on the quality score;
  4. and providing information relating to the blog document based on the adjusted score.
  5. one or more positive indicators include one or more of a popularity of the blog document
    1. an existence of a link to the blog document in one or more blogrolls associated with other blog documents,
    2. a tagging of the blog document,
    3. a reference to the blog document in other documents,
    4. or a pagerank of the blog document, and wherein
  6. the one or more negative indicators include
    1. one or more of a frequency with which posts are added to the blog document,
    2. a content of the blog document,
    3. a size of posts in the blog document,
    4. a link distribution associated with the blog document,
    5. a quantity of ads in the blog document,
    6. or a location of ads in the blog document.Ranking Blog Documents  http://appft1.uspto.gov/

Spam identified by ‘a spike in the number of topics

Similarly, a spike in the number of topics could indicate spam. For example, if a particular site is associated with a set of one or more topics over what may be considered a “stable” period of time, then if there is a (sudden) spike in the number of topics associated with the site, this may be an indication that the site has been taken over by “doorway” documents. Another indication may include the disappearance of the original topics associated with the site. In one embodiment of the invention, if one or more of these situations are detected, the search engine may reduce the relative score of such documents and/or the links, anchortexts or other data associated therewith and used for scoring the document. The Original Historical Data Patent Filing and its Children – Bill Slawski

Spam identified by comparing the content to ‘a collection of blog documents and feeds that evaluators rate as spam

For example, from a collection of blog documents and feeds that evaluators rate as spam, a list of words and phrases (bigrams, trigrams, etc.) that appear frequently in spam may be extracted. If a blog document contains a high percentage of words or phrases from the list, this can be a negative indication of quality of the blog document. Positive and Negative Quality Ranking Factors from Google’s Blog Search (Patent Application) – BILL SALWSKI

Spam identified by ‘numerous posts of identical or very similar length

Many automated post generators create numerous posts of identical or very similar length. As a result, the distribution of post sizes can be used as a reliable measure of spamminess. When a blog document includes numerous posts of identical or very similar length, this may be a negative indication of the quality of the blog document. Positive and Negative Quality Ranking Factors from Google’s Blog Search (Patent Application) – BILL SALWSKI

Spam identified  by ‘a high percentage of all links from the posts or from the blog document all point to either a single web page or to a single external site

some posts are created to increase the PageRank of a particular blog document. In some cases, a high percentage of all links from the posts or from the blog document all point to either a single web page, or to a single external site. If the number of links to any single external site exceeds a threshold, this can be a negative indication of quality of the blog document.

Ranking based on ‘history of positions‘ and ‘one or a combination of other time-based factors‘ in ranking in SERPs

In one embodiment, the time-varying behavior of how a document is ranked in response to search queries to a search engine may be used to adjust the score of that document. Referring to an exemplary embodiment of the invention as implemented by a search engine for searching the Internet, the search engine may determine that a domain which jumps in rankings across many queries might be a topical site or it could signal an attempt to “spam” the search engine. In one embodiment, in addition to history of positions (or ranking) of documents for a given query, a search engine may score a document (and in the case of an Internet document, this may be done on a page, host, site, and domain basis) based one or a combination of other time-based factors The Original Historical Data Patent Filing and its Children – Bill Slawski

Rankings based on ‘the number of queries for which, and the rate at which (increasing/decreasing) a document is generated as a search result over time

Such factors may include the number of queries for which, and the rate at which (increasing/decreasing) a document is generated as a search result over time; seasonality, burstiness and other patterns over time that a document is generated as a search result; The Original Historical Data Patent Filing and its Children – Bill Slawski

Rankings based on ‘changes in IR scores over time for a URL-query pair

[] …or changes in IR scores over time for a URL-query pair. Alternatively or in addition, in one embodiment, a number of document (e.g., URL) independent query-based criteria may be monitored over time to improve search results. For example, in one embodiment, the average IR score among a top set of results generated in response to a given query or set of queries may be used to adjust the score of that set of results (and/or the other results) generated in response to the given query or set of queries. Moreover, the number of results generated for a particular query(ies) may be monitored over time, and, for example, if the number is increasing or there is a change in the rate of increase, those results which are generated may be scored higher (e.g., such an increase may be an indication to the search engine of a “hot topic” or other phenomenon). The Original Historical Data Patent Filing and its Children – Bill Slawski

Ranking based on ‘signals that distinguish between… fly-by-night types of domains’

In one embodiment, information relating to how a document is served over a computer network (e.g., the Internet, an intranet or other network or database of documents), which information may- or may not be time-based, may be used-to score the relevance of the document.  For example, those who attempt to deceive search engines often use throwaway or “doorway” domains, and attempt to obtain as much traffic as possible before being caught. Signals that distinguish between these fly-by-night types of domains can be used in scoring. For example, domains can be renewed up to a period of 10 years, and valuable domains are often paid for several years in advance, while doorway domains rarely are used for more than a year. The date when a domain expires in the future can be used as a factor in legitimacy of a document(s) associated therewith. The Original Historical Data Patent Filing and its Children – Bill Slawski

Ranking based on ‘the age of a nameserver

In one embodiment, the age of a nameserver may also be a factor in scoring. A “good” nameserver (one that should be assigned relatively higher score) will typically have a mix of different domains from different registrars and have a history of hosting those domains, while a “bad” nameserver (one that should receive a relatively lower score) might host mainly porn or doorway or domains with commercial words (a common indicator of spam), or might be brand new, or might host primarily bulk domains from a single registrar. Again, the newness of a nameserver might not automatically be a negative factor in scoring, but in combination with other factors, such as ones described herein, it could be. The Original Historical Data Patent Filing and its Children – Bill Slawski

Ranking based on ‘ User Behaviour

In one embodiment, individual or aggregate user behavior over time may be used to score one or more documents. For example, in one embodiment of the invention, the number of times a document is selected from a set of search results and/or the amount of time one or more users spend on the document may be used to score that document. For example, if a web page is returned for a certain query, and over time or in a given time window, users spend either more or less time on average on the document given the same or similar query, then this situation may be used as an indication that the document is fresh or stale, respectively. The search engine may score the document accordingly. The Original Historical Data Patent Filing and its Children – Bill Slawski

Ranking based on ‘time-varying characteristics of traffic’ (e.g. popularity of pages at different times)

In one embodiment of the invention, time-varying characteristics of traffic to, or other “use” of, a document by one or more users is factored into the scoring of that document. For example, a web site that has experienced a large reduction in traffic may no longer be updated or may be superseded by another site. In one embodiment of the invention, a search engine compares the average traffic to a site over the last n days (n may equal 30, for example) to the average traffic during the month where the site received the most traffic, optionally adjusted for seasonal changes, or during the last m days (e.g., m may equal 365). Optionally, in one embodiment of the invention, a search engine may identify repeating traffic patterns or perhaps a change in traffic patterns over time; e.g., a document may be more or less popular (i.e., have more or less traffic) during summer, weekends or some other seasonal time period, during and outside of which the search engine may adjust its relevancy score accordinglyThe Original Historical Data Patent Filing and its Children – Bill Slawski

Ranking based on ‘advertising traffic’

Additionally, in one embodiment, time-varying factors relating to “advertising traffic” for a particular document(s) may be monitored and used for scoring a document. The Original Historical Data Patent Filing and its Children – Bill Slawski

Ranking based on revising ‘the original search query to include the candidate synonym of the particular query term‘ – Google Hummingbird?

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for:

  • Identifying a particular query term of an original search query,
  • Identifying a candidate synonym for the particular query term in context with another non-adjacent query term of the original search query that is not adjacent to the particular query term in the original search query,
  • Accessing stored data that specifies, for a pair of terms that includes the particular query term and the candidate synonym of the particular query term, a respective confidence value for the other non-adjacent query term,
  • Determining that, in the stored data, the confidence value for the other non-adjacent query term satisfies a threshold, and
  • Determining to revise the original search query to include the candidate synonym of the particular query term, based on determining that the confidence value the other non-adjacent query term satisfies the threshold.  The Google Hummingbird Patent?

Rankings based on ‘a system that automatically generates synonyms for words from documents’

“One embodiment of the present invention provides a system that automatically generates synonyms for words from documents. During operation, this system determines co-occurrence frequencies for pairs of words in the documents. The system also determnes closeness scores for pairs of words in the documents, wherein a closeness score indicates whether a pair of words are located so close to each other that the words are likely to occur in the same sentence or phrase.”

“Finally, the system determines whether pairs of words are synonyms based on the determined co-occurrence frequencies and the determined closeness scores. While making this determination, the system can additionally consider correlations between words in a title or an anchor of a document and words in the document as well as word-form scores for pairs of words in the documents”  More Ways a Search Engine Might Identify Synonyms to Expand Queries With – BILL SLAWSKI

Ranking based on ‘term relationships between terms

“Methods, systems, and apparatus, including computer program products, for scoring documents. A plurality of documents with an initial ordering is received. Local term relationships between terms in the plurality of documents are identified, each local term relationship being a relationship between a pair of terms in a respective document.”

“Relationships among the documents in the plurality of documents are determined based on the local term relationships and on the initial order of the documents. A respective score is determined for each document in the plurality of documents based on the document relationships.” Ranking Webpages Based upon Relationships Between Words (Google’s Co-Occurrence Patent) Bill Slawski

Ranking based ‘on preferences of the user, or a group of users‘ (Personalisation)

“A system receives a search query from a user and performs a search of a corpus of documents, based on the search query, to form a ranked set of search results. The system re-ranks the set of search results based on preferences of the user, or a group of users, and provides the re-ranked search results to the user.” Personalizing Search Results at Google – Bill Slawski

Ranking based on being ‘associated with a geographic location within a geographical area‘ (Geolocation)

A system may identify a first document associated with a geographic location within a geographical area and identify a second document associated with a geographic location outside the geographical area. The system may also assign a first score to the first document based on a first scoring function and assign a second score to the second document based on a second scoring function. Google Local Search Patent Application on Ranking Businesses at a Location – Bill Slawski

Ranking based on Link Analysis using Historical Data

In 2005, Google published a patent application that describes a wide range of temporal-based factors related to links, such as the appearance and disappearance of links, the increase and decrease of back links to documents, weights to links based upon freshness, weights to links based upon authoritativeness of the documents linked from, age of links, spikes in link growth, relatedness of anchor text to page being pointed to over time.  12 Google Link Analysis Methods That Might Have Changed – Bill Slawski

Propagation of Relevance between Linked Pages

Assigning relevance of one web page to other web pages could be based upon distance of clicks between the pages and/or certain features in the content of anchor text or URLs. For example, if one page links to another with the word “contact” or the word “about”, and the page being linked to includes an address, that address location might be considered relevant to the page doing that linking.  12 Google Link Analysis Methods That Might Have Changed – Bill Slawski

Link Weights based upon Page Segmentation

We’ve known for a few years that Google will give different weights for links based upon segments of a page where a link is located. It’s quite likely that something like this might continue to be used today, but it might have been modified in some manner, such as limiting in some way the amount of value a link might pass along if, for instance, it appears in the footers on multiple pages of a site.  12 Google Link Analysis Methods That Might Have Changed – Bill Slawski

Anchor Text Indexing

Using anchor text for links to determine the relevance of the pages they point towards.  12 Google Link Analysis Methods That Might Have Changed – Bill Slawski

Google’s Reasonable Surfer

“Systems and methods consistent with the principles of the invention may provide a reasonable surfer model that indicates that when a surfer accesses a document with a set of links, the surfer will follow some of the links with higher probability than others.

This reasonable surfer model reflects the fact that not all of the links associated with a document are equally likely to be followed. Examples of unlikely followed links may include “Terms of Service” links, banner advertisements, and links unrelated to the document.” Google’s Reasonable Surfer: How the Value of a Link May Differ Based upon Link and Document Features and User Data – Bill Slawski

Phrase Based Indexing

“An information retrieval system uses phrases to index, retrieve, organize and describe documents. Phrases are identified that predict the presence of other phrases in documents. Documents are the indexed according to their included phrases. A spam document is identified based on the number of related phrases included in a document.” 10 Most Important SEO Patents, Part 5 – Phrase Based Indexing

Ranking Based on ‘implicitly defined semantic structures in a document

Techniques are disclosed that locate implicitly defined semantic structures in a document, such as, for example, implicitly defined lists in an HTML document. The semantic structures can be used in the calculation of distance values between terms in the documents. The distance values may be used, for example, in the generation of ranking scores that indicate a relevance level of the document to a search queryGoogle Defines Semantic Closeness as a Ranking Signal – Bill Slawski

NB “This post may get you thinking about the benefits of using heading elements and lists on web pages for SEO purposes from a slightly different perspective than you may be used to.” BILL SLAWSKI

Rankings based on ‘a relevance of the group of blog documents to the search query and a quality of the group of blog documents’

A blog search engine may receive a search query. The blog search engine may determine scores for a group of blog documents in response to the search query, where the scores are based on a relevance of the group of blog documents to the search query and a quality of the group of blog documents. The blog search engine may also provide information regarding the group of blog documents based on the determined scores. Positive and Negative Quality Ranking Factors from Google’s Blog Search (Patent Application) – BILL SALWSKI

Ranking based on a when a ‘document is clicked more than other blog documents when the blog document appears in result sets

For example, if a certain blog document is clicked more than other blog documents when the blog document appears in result sets, this may be an indication that the blog document is popular and, thus, a positive indicator of the quality of the blog document.  Positive and Negative Quality Ranking Factors from Google’s Blog Search (Patent Application) – BILL SALWSKI

Ranking based on a blog having a ‘having a high number of subscriptions

A blog document having a high number of subscriptions implies a higher quality for the blog document. Positive and Negative Quality Ranking Factors from Google’s Blog Search (Patent Application) – BILL SALWSKI

Ranking Based on links from a ‘high-quality blogroll

‘a high-quality blogroll is a blogroll that links to well-known or trusted bloggers. Therefore, a high quality blogroll that also links to the blog document is a positive indicator of the quality of the blog document.’ high quality blogroll Positive and Negative Quality Ranking Factors from Google’s Blog Search (Patent Application) – BILL SALWSKI

Ranking based on ‘emails or chat transcripts

For example, the content of emails or chat transcripts can contain URLs of blog documents. Email or chat discussions that include references to the blog document is a positive indicator of the quality of the blog document. Positive and Negative Quality Ranking Factors from Google’s Blog Search (Patent Application) – BILL SALWSKI

Snippets generated by ‘the type of query or the location of the query terms in the document’

“We use many signals to decide which title to show to users, primarily the <title> tag if the webmaster specified one. But for some pages, a single title might not be the best one to show for all queries, and so we have algorithms that generate alternative titles to make it easier for our users to recognize relevant pages.”

“A document retrieval system generates snippets of documents for display as part of a user interface screen with search results. The snippet may be generated based on the type of query or the location of the query terms in the document. Different snippet generation algorithms may be used depending on the query type. Alternatively, snippets may be generated based on an analysis of the location of the query terms in the document.”  How Google Might Generate Snippets for Search Results – Bill Slawski

QUOTE: ‘we’re starting to use HTTPS as a ranking signal

we’ve been running tests taking into account whether sites use secure, encrypted connections as a signal in our search ranking algorithms. We’ve seen positive results, so we’re starting to use HTTPS as a ranking signal. For now it’s only a very lightweight signalGoogle Blog

QUOTE: Speed ‘site performance is now a factor in Google rankings

“Ranking is a nuanced process and there are over 200 signals but now speed is one of them know that content and relevance are still primary but making your site faster can also help VIDEO Maile Ohye

QUOTE: ‘our algorithms will eventually primarily use the mobile version of a site’s content to rank pages from that site

To make our results more useful, we’ve begun experiments to make our index mobile-first. Although our search index will continue to be a single index of websites and apps, our algorithms will eventually primarily use the mobile version of a site’s content to rank pages from that site, to understand structured data, and to show snippets from those pages in our results. Of course, while our index will be built from mobile documents, we’re going to continue to build a great search experience for all users, whether they come from mobile or desktop devices. If you have a responsive site or a dynamic serving site where the primary content and markup is equivalent across mobile and desktop, you shouldn’t have to change anything. GOOGLE

Rankings based on ‘techniques that make content less accessible to a user

‘Here are some examples of techniques that make content less accessible to a user:

(1) Showing a popup that covers the main content, either immediately after the user navigates to a page from the search results, or while they are looking through the page.

(2) Displaying a standalone interstitial that the user has to dismiss before accessing the main content.

(3) Using a layout where the above-the-fold portion of the page appears similar to a standalone interstitial, but the original content has been inlined underneath the fold. Google

Rankings based on delivering a ‘seamless mobile experience by avoiding these common mistakes‘:

  • Blocked Javascript, CSS and image files

  • Unplayable content

  • Faulty redirects

  • Mobile-only 404s

  • App download interstices

  • Irrelevant cross-links

  • Slow mobile pages

Rankings based on ‘sites that don’t have much content “above-the-fold”’

As we’ve mentioned previously, we’ve heard complaints from users that if they click on a result and it’s difficult to find the actual content, they aren’t happy with the experience. Rather than scrolling down the page past a slew of ads, users want to see content right away. So sites that don’t have much content “above-the-fold” can be affected by this change.

If you click on a website and the part of the website you see first either doesn’t have a lot of visible content above-the-fold or dedicates a large fraction of the site’s initial screen real estate to ads, that’s not a very good user experience. Such sites may not rank as highly going forward. We understand that placing ads above-the-fold is quite common for many websites; these ads often perform well and help publishers monetize online content.

This algorithmic change does not affect sites who place ads above-the-fold to a normal degree, but affects sites that go much further to load the top of the page with ads to an excessive degree or that make it hard to find the actual original content on the page. This new algorithmic improvement tends to impact sites where there is only a small amount of visible content above-the-fold or relevant content is persistently pushed down by large blocks of ads.

If you believe that your website has been affected by the page layout algorithm change, consider how your web pages use the area above-the-fold and whether the content on the page is obscured or otherwise hard for users to discern quickly. You can use our Browser Size tool, among many others, to see how your website would look under different screen resolutions. Google

Rankings based on proper ‘Handling legitimate cross-domain content duplication

“For some sites, there are legitimate reasons to duplicate content across different websites — for instance, to migrate to a new domain name using a web server that cannot create server-side redirects. To help with issues that arise on such sites, we’re announcing our support of the cross-domain rel=”canonical” link element.” Google

Rankings Changing Over Periods of Time, based on if ‘spamming activities’ are detected

“A system determines a first rank associated with a document and determines a second rank associated with the document, where the second rank is different from the first rank. The system also changes, during a transition period that occurs during a transition from the first rank to the second rank, a transition rank associated with the document based on a rank transition function that varies the transition rank over time without any change in ranking factors associated with the document.”

Those practices, referred to in the patent as “rank-modifying spamming techniques,” may involve techniques such as:

  • Keyword stuffing,
  • Invisible text,
  • Tiny text,
  • Page redirects,
  • Meta tags stuffing, and
  • Link-based manipulation.”

Bill Slawski – The Google Rank-Modifying Spammers Patent

Ratings Based on ‘how well a page achieves its purpose

“The goal of PQ rating is to determine how well a page achieves its purpose. In order to assign a rating, you must understand the purpose of the page and sometimes the website.

2.2  What is the Purpose of a Webpage? The purpose of a page is the reason or reasons why the page was created. Every page on the Internet is created for a purpose, or for multiple purposes.

Most pages are created to be helpful for users. Some pages are created merely to make money, with little or no effort to help users. Some pages are even created to cause harm to users. The first step in understanding a page is figuring out its purpose.

Why is it important to determine the purpose of the page for PQ rating?

● The goal of PQ rating is to determine how well a page achieves its purpose. In order to assign a rating, you must understand the purpose of the page and sometimes the website.

● By understanding the purpose of the page, you’ll better understand what criteria are important to consider when evaluating that particular page.

● Websites and pages should be created to help users. Websites and pages that are created with intent to harm users, deceive users, or make money with no attempt to help users, should receive the Lowest PQ rating. More on this later. As long as the page is created to help users, we will not consider any particular page purpose or type to be higher quality than another. For example, encyclopedia pages are not necessarily higher quality than humor pages.

No matter how they are created, true lack of purpose pages should be rated Lowest quality” Google Search Quality Evaluator Guidelines 2017

Ratings based on if a page is a ‘YMYL page’ or not

“We have very high Page Quality rating standards for YMYL pages low-quality YMYL pages could potentially negatively impact users’ happiness, health, or financial stability…. Important: For YMYL pages and other pages that require a high level of user trust, an unsatisfying amount of any of the following is a reason to give a page a Low quality rating: customer service information, contact information, or information about who is responsible for the website.” Google Search Quality Evaluator Guidelines 2017

Ratings Based on the ‘quality of the MC (Main Content of a page)’

“Main Content (MC) is any part of the page that directly helps the page achieve its purpose. MC is (or should be!) the reason the page exists. The quality of the MC plays a very large role in the Page Quality rating of a webpage. Google Search Quality Evaluator Guidelines 2017

Ratings based on the ‘Supplementary Content‘ of a page

“Supplementary Content (SC) is also important. SC can help a page better achieve its purpose or it can detract from the overall experience.” Google Search Quality Evaluator Guidelines 2017

Ratings based on finding ‘out what real users, as well as experts, think about a website‘ and ‘Clear and Satisfying Website Information

Use reputation research to find out what real users, as well as experts, think about a website. Look for reviews, references, recommendations by experts, news articles, and other credible information created/written by individuals about the website…. When interpreting customer reviews, try to find as many as possible. Any store or website can get a few negative reviews. This is completely normal and expected. Large stores and companies have thousands of reviews and most receive some negative ones. 

Note that different locales may have their own specific standards and requirements for what information should be available on the website. You should expect to find reputation information for large businesses and websites of large organizations. Frequently, you will find little or no information about the reputation of a website for a small organization. This is not indicative of positive or negative reputation. Many small, local businesses or community organizations have a small “web presence” and rely on word of mouth, not online reviews. For these smaller businesses and organizations, lack of reputation should not be considered an indication of low page quality.

Important: For YMYL pages and other pages that require a high level of user trust, an unsatisfying amount of any of the following is a reason to give a page a Low quality rating: customer service information, contact information, or information about who is responsible for the website. Google Search Quality Evaluator Guidelines 2017

Ratings based on (E.A.T.) ‘Expertise, Authoritativeness, Trustworthiness‘ of ‘Main Content’

“Expertise, Authoritativeness, Trustworthiness: This is an important quality characteristic. …. Important: Lacking appropriate E­A­T is sufficient reason to give a page a Low quality rating.” Google Search Quality Evaluator Guidelines 2017

Ratings Based on ‘A Satisfying Amount of High-Quality Main Content

“The quality of the MC is one of the most important criteria in Page Quality rating, and informs the E­A­T of the page. For all types of webpages, creating high quality MC takes a significant amount of at least one of the following: time, effort, expertise, and talent/skill. For news articles and information pages, high quality MC must be factually accurate for the topic and must be supported by expert consensus where such consensus exists…… Important: An unsatisfying amount of MC is a sufficient reason to give a page a Low quality rating.” Google Search Quality Evaluator Guidelines 2017

Ratings based on receiving a ‘Low quality‘ human evaluation

“6.0 Low  Quality  Pages Low quality pages are unsatisfying or lacking in some element that prevents them from achieving their purpose well. These pages lack expertise or are not very trustworthy/authoritative for the purpose of the page. If a page has one of the following characteristics, the Low rating is usually appropriate:

● The author of the page or website does not have enough expertise for the topic of the page and/or the website is not trustworthy or authoritative for the topic. In other words, the page/website is lacking E­A­T.

● The quality of the MC is low.

● There is an unsatisfying amount of MC for the purpose of the page.

● MC is present, but difficult to use due to distracting/disruptive/misleading Ads, other content/features, etc.

● There is an unsatisfying amount of website information for the purpose of the website (no good reason for anonymity).

● The website has a negative reputation.” Google Search Quality Evaluator Guidelines 2017

Ratings based on ‘Distracting/Disruptive/Misleading Titles, Ads, and Supplementary Content

“6.3 Distracting/Disruptive/Misleading Titles, Ads, and Supplementary Content Some Low quality pages have adequate MC present, but it is difficult to use the MC due to disruptive, highly distracting, or misleading Ads/SC. Misleading titles can result in a very poor user experience when users click a link only to find that the page does not match their expectations. The Low rating should be used for disruptive or highly distracting Ads and SC. ” Google Search Quality Evaluator Guidelines 2017

Ratings based on ‘Misleading Titles, Ads, or SC

“It should be clear what parts of the page are MC, SC, and Ads. It should also be clear what will happen when users interact with content and links on the webpage. If users are misled into clicking on Ads or SC, or if clicks on Ads or SC leave users feeling surprised, tricked or confused, a Low rating is justified. The Low rating should be used for disruptive or highly distracting Ads and SC. Misleading Titles, Ads, or SC may also justify a Low rating.” Google Search Quality Evaluator Guidelines 2017

Ratings based on Deceptive Page ‘Design‘ & ‘purpose’

“We consider the following kinds of pages to be deceptive webpages because users did not get what they expected. Use the Lowest rating if the page is deliberately designed to manipulate users with little or no effort to provide helpful MC. Here are some common types of deceptive pages:

● Pages that disguise Ads as MC. Actual MC may be minimal or created to encourage users to click on the Ads. For example, fake search pages (example) that have a list of links that look like a page of search results. If you click on a few of the links, you will see that the page is just a collection of Ads disguised as search engine results. A “search box” is present, but submitting a new query just gives you a different page of Ads disguised as search results.

● Pages that disguise Ads as website navigation links. For example, fake directory pages (example) that look Copyright 2017 38 like a personally curated set of helpful links, possibly with unique descriptions. In reality, the links are Ads or links to other similar pages on the site. Sometimes the descriptions of the links are unrelated to the page.

● Pages where the MC is not usable or visible. For example, a page that has such a large amount of Ads at the top of the page (before the MC), so that most users will not see the MC, or a page where the MC is invisible text.” Google Search Quality Evaluator Guidelines 2017

Ratings based on ‘Keyword Stuffed Main Content’

“Keyword Stuffed” Main Content Pages may be created to lure search engines and users by repeating keywords over and over again, sometimes in unnatural and unhelpful ways. Such pages are created using words likely to be contained in queries issued by users. Keyword stuffing can range from mildly annoying to users, to complete gibberish. Pages created with the intent of luring search engines and users, rather than providing meaningful MC to help users, should be rated Lowest.” Google Search Quality Evaluator Guidelines 2017

Rating based on ‘Automatically ­Generated Main Content

“7.4.3 Automatically ­Generated Main Content Entire websites may be created by designing a basic template from which hundreds or thousands of pages are created, sometimes using content from freely available sources (such as an RSS feed or API). These pages are created with no or very little time, effort, or expertise, and also have no editing or manual curation. Pages and websites made up of auto­generated content with no editing or manual curation, and no original content or value added for users, should be rated Lowest.” Google Search Quality Evaluator Guidelines 2017

Rating based on ‘Copied Main Content

“Important: The Lowest rating is appropriate if all or almost all of the MC on the page is copied with little or no time, effort, expertise, manual curation, or added value for users. Such pages should be rated Lowest, even if the page assigns credit for the content to another source.” Google Search Quality Evaluator Guidelines 2017

QUOTE – “No duplicate content penalty

We do have some things around duplicate content … that ARE penalty worthy“ Google’s John Meuller on best practices with website duplicate content

Rating based on ‘Hacked, Defaced, or Spammed Pages

“Some websites are not maintained or cared for at all by their webmaster. These “abandoned” websites, especially websites that have become hacked, defaced, or spammed with a large amount of distracting and unhelpful content, should be rated Lowest. A hacked or defaced website is a site that has been modified without permission from the website owner(s). Responsible webmasters should regularly check their websites for suspicious behavior and take steps to protect users. We’ll consider a comment or forum discussion to be “spammed” if someone posts unrelated comments that are not intended to help other users, but rather to advertise a product or create a link to a website. Frequently these comments are posted by a “bot” rather than a real person. Spammed comments are easy to recognize and may include Ads, download, or other links. Webmasters should find and remove this content because it is a bad user experience. While a specific page on a website may have a large amount of spammed forum discussions or spammed user comments, it does not mean that the entire website contains only spam”. Google Search Quality Evaluator Guidelines 2017

Rating Based on ‘demonstrably inaccurate content

“Pages that appear highly untrustworthy should be rated Lowest, even if you’re not able to completely confirm their lack of trustworthiness.” Google Search Quality Evaluator Guidelines 2017

Ratings for ‘Pages with Error Messages or No MC

“Some pages are temporarily broken pages on otherwise functioning websites, while some pages have an explicit error (or custom 404) message. In some cases, pages are missing MC as well. Please think about whether the page offers help for users—did the webmaster spend time, effort, and care on the page?” Google Search Quality Evaluator Guidelines 2017

Ratings for ‘Promotion of Hate or Violence

“Use the Lowest rating for pages created with the sole purpose of promoting hate or violence against a group of people based on criteria including (but not limited to) race or ethnicity, religion, gender, nationality or citizenship, disability, age, sexual orientation, or veteran status. Websites advocating hate or violence can cause real-world harm.” Google Search Quality Evaluator Guidelines 2017

Demotions Based on ‘not implementing Google policies’

I didn’t SEO at all, when I was at Google. I wasn’t trying to make a site much better but i was trying to find sites that were not ‘implementing Google policies’ and not giving the best user experience.”  Ex Google Webspam Team

Google Webmaster Guidelines

Every single update that we make is around the quality of the site or general quality, perceived quality of the site, content and the links or whatever. All these are in the Webmaster Guidelines. When there’s something that is not in line with our Webmaster Guidelines, or we change an algorithm that modifies the Webmaster Guidelines, then we update the Webmaster Guidelines as well. Gary Illyes




Learn how you can get more sales from your website

Subscribe for free and let us share with you:

  • how to submit your site to Google and other search engines.
  • how to optimise your site to get more traffic from Google.
  • how to target the most valuable keywords for your business.



Free SEO course

Join 70K+ subscribers and get free SEO training.