| |
The Search Engines
OK, we all know what a search engine is, but you might be surprised to learn how much complexity is hidden
beneath the simple text box in which you type your search. If you’re serious about your SEO campaign, you
need to understand some fundamentals of search technology.
The Big Five
Five major players dominate the organic (free) search market: Google (37.6%), Yahoo (30.4%), MSN (15.6%), AOL (9.2%)
and Ask (6.1%) are
the big boys. The below diagram illustrates the approximate market share as of July 2005:
Search Engine Market Share July 2005
Note that the chart illustrates searches performed and is therefore requires interpretation.
AOL, for example, aquires it search results (organic and paid) through its partnership with Google. This boosts
Google's market share to almost 47% and means that your SEO campaign need do nothing special to target AOL.
Google
The Search Engine that spawned a verb - "Googling" is now in the dictionary
and Google shows little sign of loosing its grip on its large market
share. Many people concentrate on Google to the extent that they
forget about everyone else. This can be a costly mistake, but there's no doubt
that Google is the leader in the field. Experts and ordinary users alike percieve
Google as the leader in terms of 'quality' as well as market share. There are many factors
beneath this 'quality' label, and we discuss in more detail below.
Yahoo
According to the Alexa rankings, the Yahoo domain is the most visited in the
world. With this massive pool of users to draw on, Yahoo Search remains an
important player.
Live (MSN) Search
Microsoft's Search Engine relied for years upon the Inktomi index to provide its
results. However as of 2005 Inktomi is now owned by Yahoo, a situation Microsoft appears
to find unnacceptable. Live (MSN) search now relies upon Microsoft's own crawler (MSNBot)
to provide organic results.
The others
There are literally hundreds of search engines out there, some good some bad.
None of them come close to the big three in terms of the sheer volume of
traffic they can generate, but some of them are certainly worth a look,
particularly if the content of your site is specialized. There are many
"special interest" search engines out there that target material related to
their topic of choice.
How Do Search Engines Work?
You can think of a search enigne as made up of three main parts:
-
Crawler
– A crawler (sometimes called a spider) is a piece of software who's purpose in life
is to visit websites and record information about them. This information is analyzed and
stored in an index (discussed below). GoogleBot, MSNBot and Slurp (Yahoo) are names you will
become intimately familiar with as you progress through your SEO campaign.
-
Index
– The index is a data store in which the search engine stores all the information discovered by
the crawler along with the results of its post-crawl analysis. Most of the search engine 'magic'
happens here and has already happeneed long before you click the search button.
-
Query Analyser
– This is the piece of software that tries to make sense of those words you typed into the box. Different
search engines take different approaches to this. Ask, for example, has always placed an emphasis on
natural language processing, allowing/expecting the user to type a question into the search box. Other search
engines concentrate on interpreting keyword lists. Even in this intuitively simpler case divining a users
genuine intention from what they type into the box is extremely difficult. In recent times, Google is widely
accepted as doing the best job of correctly interpreting the desire of the user.
A picture being worth a thousand words, the below illustrates what goes on when (and before) you
click the search button:
What Makes a Good Search Engine?
Each of the engines has its own strengths and weaknesses. Below are the four major factors that
(in my view) contribute to search engine quality:
-
Capture
– Refers to how many of the correct results are returned by your search. If we imagine a small web made up of
100 pages, of which 3 are about fish, then a engine that returns those three documents for a search for
‘fish’ has a capture ratio of 100% (for this example). This sounds great, until we realize that a dumb search
engine that always returns every document will also score 100%. As a user I will have trouble
finding the pages of interest to me, so this measure of success is obviously not enough.
-
Exclusion
– Refers to how many of the incorrect results are not returned by your search. The higher this score, the
less ‘noise’ (irrelevant results) the user needs to wade through to find what he wants. In the above example,
a search engine that returned only the three ‘fish’ pages and no others would have an exclusion score of
100% (since 100% of the 97 pages unrelated to fish were not returned.)
-
Ranking
– Of the three pages returned, in what order are they presented to the user within the SERPs? This is what
most of the topics within SEO are concerned with. An algorithm that effectively places more relevant pages
higher up the results is more useful and more effective at delivering the content a user was looking for.
-
Presentment
– After all the above has taken place, the search engine needs to decide how to display the results to
the user, e.g. should they show the meta description or a highly matched piece of content, how do they
choose that content, etc. Returning good results is of little value if users cannot discern which is
likely to provide them with what they need and decide to click on them.
Note: Several other factors, such as performance, could be added here, but since it isn’t really an issue with any of the well known search engines, we can safely ignore it.
| |