Search Engines 

SEO 101





 

The Search Engines

OK, we all know what a search engine is, but you might be surprised to learn how much complexity is hidden beneath the simple text box in which you type your search. If you’re serious about your SEO campaign, you need to understand some fundamentals of search technology.

The Big Five

Five major players dominate the organic (free) search market: Google (37.6%), Yahoo (30.4%), MSN (15.6%), AOL (9.2%) and Ask (6.1%) are the big boys. The below diagram illustrates the approximate market share as of July 2005:

Search Engine Market Share July 2005

Search Engine Market Share

Note that the chart illustrates searches performed and is therefore requires interpretation. AOL, for example, aquires it search results (organic and paid) through its partnership with Google. This boosts Google's market share to almost 47% and means that your SEO campaign need do nothing special to target AOL.

Google

The Search Engine that spawned a verb - "Googling" is now in the dictionary and Google shows little sign of loosing its grip on its large market share. Many people concentrate on Google to the extent that they forget about everyone else. This can be a costly mistake, but there's no doubt that Google is the leader in the field. Experts and ordinary users alike percieve Google as the leader in terms of 'quality' as well as market share. There are many factors beneath this 'quality' label, and we discuss in more detail below.

Yahoo

According to the Alexa rankings, the Yahoo domain is the most visited in the world. With this massive pool of users to draw on, Yahoo Search remains an important player.

Live (MSN) Search

Microsoft's Search Engine relied for years upon the Inktomi index to provide its results. However as of 2005 Inktomi is now owned by Yahoo, a situation Microsoft appears to find unnacceptable. Live (MSN) search now relies upon Microsoft's own crawler (MSNBot) to provide organic results.

The others

There are literally hundreds of search engines out there, some good some bad. None of them come close to the big three in terms of the sheer volume of traffic they can generate, but some of them are certainly worth a look, particularly if the content of your site is specialized. There are many "special interest" search engines out there that target material related to their topic of choice.

How Do Search Engines Work?

You can think of a search enigne as made up of three main parts:

  • Crawler – A crawler (sometimes called a spider) is a piece of software who's purpose in life is to visit websites and record information about them. This information is analyzed and stored in an index (discussed below). GoogleBot, MSNBot and Slurp (Yahoo) are names you will become intimately familiar with as you progress through your SEO campaign.
  • Index – The index is a data store in which the search engine stores all the information discovered by the crawler along with the results of its post-crawl analysis. Most of the search engine 'magic' happens here and has already happeneed long before you click the search button.
  • Query Analyser – This is the piece of software that tries to make sense of those words you typed into the box. Different search engines take different approaches to this. Ask, for example, has always placed an emphasis on natural language processing, allowing/expecting the user to type a question into the search box. Other search engines concentrate on interpreting keyword lists. Even in this intuitively simpler case divining a users genuine intention from what they type into the box is extremely difficult. In recent times, Google is widely accepted as doing the best job of correctly interpreting the desire of the user.

A picture being worth a thousand words, the below illustrates what goes on when (and before) you click the search button:

Search Engine Process

What Makes a Good Search Engine?

Each of the engines has its own strengths and weaknesses. Below are the four major factors that (in my view) contribute to search engine quality:

  • Capture – Refers to how many of the correct results are returned by your search. If we imagine a small web made up of 100 pages, of which 3 are about fish, then a engine that returns those three documents for a search for ‘fish’ has a capture ratio of 100% (for this example). This sounds great, until we realize that a dumb search engine that always returns every document will also score 100%. As a user I will have trouble finding the pages of interest to me, so this measure of success is obviously not enough.
  • Exclusion – Refers to how many of the incorrect results are not returned by your search. The higher this score, the less ‘noise’ (irrelevant results) the user needs to wade through to find what he wants. In the above example, a search engine that returned only the three ‘fish’ pages and no others would have an exclusion score of 100% (since 100% of the 97 pages unrelated to fish were not returned.)
  • Ranking – Of the three pages returned, in what order are they presented to the user within the SERPs? This is what most of the topics within SEO are concerned with. An algorithm that effectively places more relevant pages higher up the results is more useful and more effective at delivering the content a user was looking for.
  • Presentment – After all the above has taken place, the search engine needs to decide how to display the results to the user, e.g. should they show the meta description or a highly matched piece of content, how do they choose that content, etc. Returning good results is of little value if users cannot discern which is likely to provide them with what they need and decide to click on them.

Note: Several other factors, such as performance, could be added here, but since it isn’t really an issue with any of the well known search engines, we can safely ignore it.