Indexes and web search engines

Search engines collect data via bots, otherwise known as crawlers or spiders, parse these characters by analysing a string of symbols with grammatical rules, and, finally, store these keywords on their own, unique index. We search an index not the web.

What is an index?

“A collection of information stored on a computer or on a set of cards, in alphabetical order”
Cambridge Dictionaries’ Online

An index is a table of keywords found by search engine bots. Most web search engines store different parts of the web because their index differs from search engine to search engine. Different indexes means each search engine “sees” different parts of the web. This explains why Bing’s results differ from Google’s, for instance.

Having an index is not actually a simple process, not least because you need to design a reliable bot, but you need to think about the web as a communication medium: The web’s infrastructure is large in size, unpredictable in nature and crawling it is time consuming. Indexing the web is a complex process because the infrastructure does not allow for an organised web.

Search engine indexes are like filing cabinets.

Filing cabinets are similar to web search engine indexes. From 401(K) 2012.

Google’s Inside Search state that indexing is the start of a searcher’s search journey. A search engine must successfully get a good index which allows for easy user retrieval. An index is therefore key to a good search engine.

Indexes and location

Indexes are one-dimensional in that they are made up of 2 items: keywords and ID numbers. This means that once you type, say, “gerald murphy digital” these terms are matched to IDs rather than the individual keywords “gerald” “murphy” “digital”.

If keywords are location-specific then these may be assigned to an index too (making indexes contain 3 items). This means that some search engine indexes are two, or even three, dimensional indexes. Search engines have to adapt to new technologies, such as, 3G and location-based searching, for example.

Since indexes use IDs once a keyword has been typed the most up-to-date IDs can be retrieved which allows SERP’s to display and rank fresh content on specified queries. It is, however, worth noting that no index is ever up-to-date but most of our searches are close to real-time searches.

If you liked this post you might want to search various indexes. By doing so you are searching more of the web. This bypasses the restrictive limitation of just using Google.

Posted by

References

  1. Wikipedia. (no date) Search engine indexing. [Online] [Accessed on 25th September 2013]
  2. Tan, Q. (2009) Designing New Crawling and Indexing Techniques for Web Search Engines. ProQuest LLC: United States.
  3. Zhou, Y. Xie, X. Wang, C. Gong, Y. and Ma, W. (2005) Hybrid Index Structures for Location-based Web Search. CIKM. pp. 155 — 162
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s