To a lot of people – myself included prior to this job – Google is some sort of magic box for every possible query and search under the sun. “How does it work?” is usually met with shrugged shoulders and a puzzled expression but not today. So let’s get into it.
People are adding new things to the web every second of every day. A new blog, website page, a new Facebook post, a Tweet, or generally any fresh content at all. Alongside this, Google bots crawl around the web searching for and indexing fresh content, following links across the web and keeping track of everything they touch including (but by no means limited to); links to and from pages, the level of quality content on a site, any ad copy present, and the user quality and interaction.
All of this provides the Webmasters at Google with a score of how relevant and valuable a site is, and this goes on all the time. Google are pretty good at keeping interested parties up to date with algorithm changes and overhauls and generally announce the development of any major updates so let’s look at a few of the big ones.
Now we are in a position to dive into the development of this web-index. In 2010, Google announced ‘Caffeine’ – the first iteration. When introducing the change, Google cited radical web expansion not just web extension, saying “the average webpage is richer and more complex” and as such a better, more dynamic and intelligent algorithm was needed.
The first overhaul to Caffeine came in 2011 with the rollout of Google Panda – named after an engineer who worked on the project. The key targets of Panda were sites with spam-heavy, duplicate or otherwise poor-quality content and the aim was to improve the quality of search results towards those sites with solid, original content. For interested readers (with some time to kill), the patent overview is available here. At the time, it was estimated that v1.0 overhauled the search rankings for 12% of all Google’s search traffic. After receiving a fair amount of backlash from Webmasters who thought they had been unfairly penalized, Google published some guidelines to help websites understand the changes and avoid being falsely targeted – available here. Some of these sites reported seeing a drop in traffic of over 90% after the rollout of Panda.
Panda continued merrily sifting, searching, and indexing the web for a year or so before Google Penguin arrived. The aim of Penguin was to decrease traffic to websites that violated Google’s Webmaster Guidelines. These had been heavily abused since Google Panda released as Webmasters began manipulating the algorithm to falsely promote their sites. These techniques have since been referred to as ‘black-hat SEO’, one example of this is link-farming which involves cheating a ranking feature in Panda aimed at rewarding sites with quality external links. Schemes for ‘link-sharing’ began to pop-up whereby a webpage could pay for solid links to other websites in the scheme in exchange for theirs. In this way, a group of poor-quality websites could rise up the search rankings and appear above quality sites. Penguins revamp hit around 3.1% of all English searches.
Google’s zoo of algorithm updates was then joined by Hummingbird. This addition represents the development of semantically intelligent search where the algorithm started to determine the meaning of entire queries over just analysing and searching word-by-word, with the hope that search would provide pages that answer the meaning of a question and not just those pages that hit each keyword.
By now, Google wanted to answer simple questions within the results page itself and Hummingbird was the first step. It is Hummingbird at work when you google ‘Height of Ama Dablam’ and the following result shows inside the results page.
Another bird to add to the roost, Google Pigeon arrived to promote local businesses in Google search. “Aimed at providing a more useful, relevant and accurate local search results” this update also overhauled the Google Maps feature to provide relevant search results based on location and local directories. Pigeon is in action in the picture below for a search about local gyms. The impact this had on local business was huge, with many looking to solidify their presence in directories and business listings to help Google promote them higher up the search.
The aptly named ‘Pirate’ is googles answer to growing pressure from Hollywood and the entertainment industry to combat pirated content online. The algorithm identifies those sites with a large number of valid copyright notices against them and penalizes them in the search rankings for it. Alongside steering traffic away from illegal sources of media, the hope was that Google would promote the sources of genuine music, video and film sales.
A huge number of updates and revisions have been made since Pirate was installed – Google update their engine hundreds of times a year – and we’d be here until the cows come home talking you through them – but luckily a handy changelog can be found here for the interested reader!
So we’ve covered some of the heavy hitters inside Google HQ, the real question is what’s next?
One thing we can be pretty sure of is that more development time will be put into updates like Hummingbird with a focus on semantic search improvements and answering searches inside the SERP itself. Another likely bet, given the tidal shift towards mobile, is more innovation towards making web search more friendly and easy for mobile users. Whatever the case, you can be sure Google are working on something so come back when it drops and we’ll keep you updated !