Pandia
Search engine news spacer
Home  | Metasearch  | All-in-one  | Tutorial  | Newsfinder  | Radio Search
spacer spacer

Home with News
SE Blogs and Sites

Search tools:
Powersearch All-in-One
Metasearch
Newsfinder
Shopping Search
Radio Search
People Search
Kids & Teens

On Web Searching:
Search Tutorial
Search Trends

On Search Ranking:
SE Marketing Tutorial
SE Optimization Gateway

On Enterprise Search

On Pandia
Free Newsletters

Click here for fresh news from the Pandia Newsfinder!

Pandia Post Newsletter No. 11 2001 Part 5

BOOKS

The Invisible Web

Invisible Web book coverThere is much talk about "the invisible" -- or "hidden" -- Web these days, but what is it? It is all about search engines, really. The traditional search engines do not search the whole Internet, whatever they say.

According to search engine experts Chris Sherman and Gary Price the Invisible Web consists of material that general-purpose search engines either cannot or will not include in their collections of Web pages.

In their recent book The Invisible Web they define four types of invisibility:

The Opaque Web consist of files that can be, but are not, included in search engine indexes. Some search engines limit the number of webpages they include from each site as this kind of "crawling" demands a lot of resources. Then there is the frequency of the crawl. There might be new pages that have not yet been included by the search engine. Some pages cannot be found, as there are no links to it from other webpages or because the webmaster has failed to inform the search engine about its existence.

The Private Web consists of webpages that are available but that have been deliberately excluded by the webmasters themselves. They may be password protected, or the webmaster might have included a "noindex" metatag or used a so-called robots.txt file to instruct the search engine robot to skip the page.

The Proprietary Web consists of pages that's only accessible to people who have registered to view them. Search engine robots cannot fill in a form, so there is no way they can get into restricted portions of a site.

The Truly Invisible Web consists of content that cannot be indexed for technical reasons. The documents may be in a file format that is not recognized by the search engine robot. Until quite recently most traditional search engines indexed regular webpages only (i.e. HTML-based documents). Google will include Acrobat PDF, postscript, Microsoft Office files now, but most search engines do not.

Then there is the dynamically generated webpages, i.e. webpages that are generated on the fly by a script that queries a database. Search engines will normally (but not always) avoid such pages as they are afraid that they will be trapped in an endless loop. Moreover, they cannot fill in a form, so if the site requires you to fill in a form to get access to information, the search engine will not find it.

As Sherman and Price will tell you the webadresses of dynamically generated pages often include special characters like ? and &. For instance http://www.pandia.com/index.html is a regular static webpage. http://www.pandia.com/cgi-local/meta.pl?search=%22Gary+Price%22
&etype=web&template=m2.html
is a page generated by a script, in this case the Pandia Metasearch Engine.

All this means that there is a lot of valuable information out there that cannot be found using the traditional search engines. And that is what this book is about.

Not only do Sherman and Price give an excellent introduction to the concept of the Invisible Web, they also tell you how to access this part of the Internet. The second half of the book is actually a well described catalog of portals that presents Invisible Web content as well as directory of more than 1000 selected Invisible Web sites.

The actual selection of directory sites is a bit puzzling, as some categories are well represented, others are not, but the directory provides a lot of useful information just the same. The authors themselves say that because the Invisible Web is so huge, and constantly changing, creating a totally comprehensive directory is virtually impossible. Their goal was to go for quality over quantity-though they continue to add new resources as they find them.

The problem with printing Web resource catalogs in books is that they become outdated very fast. That is why they have decided to publish an updated version of the directory on the Web, at http://www.invisible-web.net/.

This book has actually been criticized for including too much general information on search engines and Web searching, and if you are of the busy type that stick to the executive summary this book is probably not for you (although you can always skip the first chapters). We enjoyed the historical and technical introduction to Web searching very much, though. It actually makes this book a useful introduction to Internet searching in general.

Note that the Pandia Powersearch All-in-One search page also has a section on Invisible Web resources.

Buy this book from Amazon.com:
http://www.amazon.com/exec/obidos/ASIN/091096551X/
ref=nosim/pandiainternetse/


Buy this book from Amazon.co.uk:
http://www.amazon.co.uk/exec/obidos/ASIN/
091096551X/pandiasearchcent

Pandia Powersearch http://www.pandia.com/powersearch/index.html#specialized

How to Search the Web

There is no way we can give you the objective truth about this book, as we have written it ourselves.

It is the first in a series of three ebooks on search engines published by the Intellectua ebook company. How to Search the Web is a short and concise"three minute" tutor on efficient Internet searching, a bit similar to our Goalgetter Web Search Tutorial.

Unlike the Goalgetter tutorial, however, this guide is published in the popular Acrobat PDF format, meaning that you can print it in an easy to read format and read it in bed if you want to. All Web addresses are clickable, but they are also given in full, so that you can use your paper copy as a source of URLs.

The ebook covers all the major search engines and directories, and gives an easy to understand introduction to more advanced Internet searching.

Click here to read more about this ebook: http://www.dirtsmart.com/titles/3mt0010.html?10389

More books on search engines and Internet searching

FINALLY...

Do you like Pandia? Feel free to forward this newsletter to a friend. Click here to recommend the Pandia site to a friend: http://www.recommend-it.com/l.z.e?s=328530

Go to http://www.pandia.com/post/ to find information on how to subscribe and unsubscribe.

The Pandia Post is edited by Per and Susanne Koch, to stop spam we have a graphic file showing the email address. Pandia Post Home Page: http://www.pandia.com/post/.

Sign up for our free newsletter today!

Enter your email address below and click 'Subscribe':


Privacy policy

Home || Search tools: | Metasearch Engine | Newsfinder | Radio Search | Express | Help || All-in-one: Powersearch | People Search | MacPandia || On Web Searching: Free Newsletter | Goalgetter Search Tutorial | Books and more | Search Resources | Search World News | Syntax Q-cards || On Search Engine Optimization: SE Marketing 101 | SE Optimization Gateway | SE Submission | Pay Per Click || On Pandia: About Pandia | Search the Pandia site & site map | Contact information | Pandia Store || Other: Newsreport | Domain name lookup | Browse the Amazon Top 100!

Pandia is a registered service mark of P&S Koch, Oslo, Norway. All other company and product names are the trademarks or registered trademarks of their respective holders. © P&S Koch 1998-2012. Comments or questions? Go to our contact page.