The Deep Web or "Invisible Web"


The deep Web has gotten a lot of press these days. The Web is becoming a complex entity that contains information from a variety of source types. It is much more than fixed Web pages. In fact, the part of the Web that is not fixed, and is served dynamically "on the fly," is far larger than the fixed documents that many associate with the Web. Some people incorrectly refer to this content as the "invisible Web," for reasons that will be explained below.

When we refer to the deep Web, we are talking about the following:

The phenomenon of databases on the Web has been talked about for years, before the terms "invisible Web" or "deep Web" were coined. People sometimes referred to them as specialty databases, subject-specific databases, virtual libraries, and other similar terms. As Web technology develops and greater amounts of information are mounted on the Web, these databases take on primary importance as information finding tools.


Terminology


Why is this content referred to as the "invisible Web"? This is because the content of databases rarely shows up in a search engine result. Search engine spiders cannot or will not go inside database tables and extract the data. Database content is therefore "invisible" to them.

However, the term "invisible Web" is a poor choice for these reasons:

  1. The term is very search engine-centric. It assumes that the only way to find information on the Web is to consult a search engine. If the information cannot be found on a search engine, you're out of luck. This is simply not the case.

  2. There is no such thing as recorded information that is invisible. Some information may be more of a challenge to find than others, but this is not the same as invisibility.

  3. Informational databases have been available for years. Many of us are familiar with a library's collection of CD-ROMs or Web-based research databases. We use online catalogs, which are databases of a library's holdings. No one has ever called this information a part of the "invisible library." These are simply databases whose content is available through user query. Just like a library, the Web contains information of different types that is stored and retrieved in different ways.

  4. The content of search engines on the Web is itself stored in databases and available only through user query. Shouldn't we call this invisible, too? We're labelling as invisible something that is available only through user query (the invisible Web) because it isn't accessible from within something else that is also available only through user query (search engines). The logic of this terminology just doesn't hold up.

A company called BrightPlanet has coined the term "deep Web" to describe the phenomenon of searchable databases on the Web. (The static Web is referred to as the "surface Web.") This is much better since database content is visible with the appropriate search and retrieval technology.


A Few Tips for Dealing with the Deep Web


When dealing with the deep Web, keep these points in mind:


Sources of Deep Web Content


As noted above, deep Web sites can be located in subject directories and search engines. In addition, deep Web content is available on search engine sites as featured content.

In addition, there are Web sites that specialize in collecting links to databases available on the Web. Not all of these services limit themselves to deep Web content. Also, the scope and usefulness of their holdings varies. Nonetheless, these sites provide an interesting look at the promise of deep Web searching.

CompletePlanet Offers searchable access to thousands of databases for results that include summaries from the retrieved site; also offers the LexiBot software for accessing deep Web content
Direct Search Large compilation of links to the search interfaces of a wide variety of research resources on the Web compiled by Gary Price of George Washington University [warning: large file]
The InvisibleWeb Directory of over 10,000 databases, offering the option to search for the database you need, from IntelliSeek
Invisible-web.net Directory of high quality deep Web databases maintained by Gary Price and Chris Sherman
Lycos Directory: Searchable Databases Large collection of invisible Web databases organized by topic; almost identical to The InvisibleWeb above
ProFusion Meta engine that also offers searches of multiple "vertical search sources" on the deep Web organized into topical categories
Search.Com Dozens of topic-based databases from CNet
Search Engines and News Large collection of topical search engines and newswires maintaned by wwwINTERNETS
Subject Directory of Search Engines Topical listing of searchable databases on the Web from the SearchIQ search engine review site

If you are interested in this topic, CompletePlanet offers many details about the deep Web. See especially the Deep Web White Paper.