Searching the Web

Composed of web documents badly indexed by conventional search engines

Introduction

"The Invisible Web is composed of web documents which are badly indexed by the broad-based conventional search engines"  (or not indexed at all) "Well known databases such as  LexisNexis, Dialog… Factiva - only constitute 1% of the Deep Web!“  -- Digimind, 2007

                              Iceberg symbolizing Surface Web vs. Deep Web

Deep Web Search Engines

There are many ways to search the deep web; here are but a few:

"Deep" or "Invisible" Web

What is in this space:

  • Protected pages (sign in required)
  • Dynamically Generated Pages
  • Commercial resources with domain or IP limitations (ex Intranet)
  • Subscription Databases 

Further Reading

Dig deeper into this fascinating topic with these highlighted readings

How Big?

Nobody really knows! It is always changing.

  • 4000-5000 times larger than the surface web [2012 est.]

“We found that the deep Web measured 450,000 Web *databases, among  which 348,000 were structured.” Only a very small percentage of these are searchable from surface web search engines like Google.

* Databases are structured, searchable web sites containing distinct collections of objects in text, image, audio, video…format.

source: Structured Databases on the Web: Observations and Implications (highly cited paper from 2004)