Tag Archives: surface web

What information can Google not store?

Surface vs deep web

There are two types of web: the surface web is the part that search engines can see and index, for example BBC News, whereas the deep web refers to the parts that cannot be accessed, for instance, online banking information is hid behind a password wall. So Google cannot store anything from the deep web because their crawlers cannot crawl past firewalls, passwords or another restricted access point.

Photo showing different parts of the web.

Photo showing different parts of the web. Source: ConetIslandDreams

Cookies, IP address and browsers

Any information you give, for example your username and password, can be stored by Google, as well as, other indirect pieces of information such as cookies, IP addresses and browsers, for instance. Some of these technologies are not as clear as others. Cookies, as an example, are not broken down line-by-line so their precise use is simply not known. Can a cookie take note of an IP address, and what ISP you use, or where you live? It is not impossible for Google to track an IP address to a specific location. In fact other pieces of technology can pinpoint your location.

Case study of Google Street View

It is not uncommon for large companies to use and misuse information on their products and services. Google’s Street View cars were ordered to clear data they collected as they took pictures for their Street View service. So Google has, and can be referred to as being “evil”, misused and stored lots of unauthorised information.

A devil theme to Google's logo.

A devil theme to Google’s logo. Source: 4.bp

What information does Google store?

Google is likely to archive most things from the surface web. Your bank, as an example of a surface web website, is likely to be crawled and stored in an index but Google cannot search or store your bank account information because it is hid behind password walls, therefore considered to be within the deep web, and secure servers….

Android viewpoint

If you own and use an Android mobile Google may be able to collect even more information about you. Phone numbers and call records can be stored. Is the future of Google’s business model likely to produce cheap flights to Australia if you call a person over there frequently?

If you are interested in what information Google can store read the references below this post to learn more. Would you like me to post about a specific search engine topic? Tweet Gerald.

Posted by Gerald Murphy

References

  1. Channel 4. (no date) What does Google know about you?
  2. Google. (2013) Google’s privacy policy.
  3. Peng, W. (2000) HTTP cookies – a promising technology. Online Information Review. 24(2) pp. 150 – 153
  4. Rawlinson, K. (2013) Google ordered to delete data collected by Street View cars.

What might a librarian call a ‘quality’ source?

In order to find out why people within the information industry pay for information I got 14 librarians to complete a self-completed questionnaire. This post identifies what makes quality information*.

What makes subscription services better?

There is no single answer to this question. The responses seem to suggest that paid, subscription services offer better quality information. So, what makes quality information? The following (ordered) list addresses what MMU Librarians suggest is quality information. In parenthesis (round brackets) the number indicates how many librarians shared this view within this survey.

  1. Peer reviewed (8)
  2. Academic (4)
  3. Authoritative (4)
  4. Better quality (4)
  5. Relevant results (3)

What makes quality information?

From a personal viewpoint, a good search is a search which is relevant to the search query. But defining quality can be a little tricky because there are so many words to describe quality.

From an interface viewpoint
There is less clutter on subscription services. Furthermore, this interface is uniform (i.e. it does not change for certain users).

From a search engine results pages’ (SERP) viewpoint
Quality results are: in date, accurate (because they match to the search query), indexes bigger selection (possibility because the deep web is used in their resources), relevant results (because each hit is related to the search query), and, more focused.
From a creators’ viewpoint
Subscription services offer editorial / scholarly resources. This has an impact on quality control. For this reason, subscription services are unlikely to have completely irrelevant hits. It is also worth noting that anyone can publish on the Surface Web.

Do paid services offer better quality information?

The short answer is yes. Paid subscription services are more likely to be relevant to a search query. It is worth noting, though, both paid and unpaid services produce irrelevant hits within their SERPs because the current algorithms do not calculate a positive or negative outcome for accuracy.

In other words, if a search engine thinks a document is a little relevant, it will show in the SERP, even if a relevant term is contained within a document in a completely different context.

*This is not necessarily a librarians viewpoint. This study was completed in Manchester Metropolitan Universities’ largest library, the Sir Kenneth Green Library.

What is the Web?

What is the Internet? What is the Web? Is their a difference?

People use the terms Web and Internet interchangably, but they are in fact very different.

“The World Wide Web, or Web, is in fact just one of a number of ways information can be exchanged over the Internet, another being e-mail” (Murphy and Persson 2009:4).

The internet, on the other hand, refers to the physical makeup of how we communicate (i.e. the cables that carry the images, the switches that receive the signal of these cables).

Sir Tim Berners-Lee invented HyperText Markup Language (HTML) and allowed people to use if for free. HTML is the backbone of the Web because it allows everyone to participate in the communication of information.

So the Internet refers to the physical network(s), whereas the Web allows us to use the Internet as a means of communication.

Are there different types of Web?

In short, the Web is all the same; however, how we access (and interact with) it differs. For this reason the Web is full of information which can be accessed in so many different ways, so much so, people refer to different sections of the Web. To give you an overview of all of these references/names, see the list below:

  • Opaque Web refers to files that can, but are not indexed (Sherman & Price 2001).
  • The private Web is tecnically indexible because it is protected by passwords.
  • Proprietary Web requires users to agree to special terms before you use their service (e.g. NYTimes).
  • Invisible Web refers to parts of the Web that cannot be accessed, such as, social media.

The fact that there are so many different names for the Web gives it a true global scale: The Web is a huge place and people are still unsure just how big it actually is.

References

  1. Murphy, C., and Persson, N. (2009) HTML and CSS Web Standards Solutions. USA: Apress and Friends of Ed.
  2. Pedley, P. (2001) The Invisible Web. London: Aslib-IMI.
  3. Sherman, C., & Price, G. (2001) The Invisible Web. New Jersey: Info Today Inc.