A Short Tutorial To Search The Web
Note.: This is a archived version of "The pandia goalgetter". Please check detailed note at the end of article for more details.
There are millions and millions of webpages out there. However, as most of us have troubles finding an old letter on our own computer, how can we find relevant information on this "global hard drive"? After all, this is the closest thing we get to a World Wide Anarchy.
Well, there are people out there trying to catalogue the Web for us. Furthermore, virtual robots are scurrying around, trying to map the vast expanses of Cyberspace. Although most of them can cover only a small part of the Net, the task of finding anything among some two billion pages is still daunting.
However, the main problem is not that the search engines and the search directories find too little, but that they find too much. It is hard to uncover the needle in a list of 400,000 hits. That's why this article brings you this short and easy tutorial. To get the right answer, you must ask the right question. This tutorial will tell you exactly how to do that!
It will take you approximately 30 minutes to read the entire tutorial through, and you will learn the essentials of Web searching in less than an hour. By improving your searching skills you will be able to find what you are looking for faster and more efficiently. How is that for an investment?
What kind of search engines or directories should you use?
Most people are primarily interested in tools for finding information on the World Wide Web. Originally there were two kinds of search services on the Web: directories and engines.
Search directories are hierarchical databases with references to websites. The websites that are included are hand picked by living human beings and classified according to the rules of that particular search service.
ODP or DMOZ is the mother of all search directories. However, in order to search the directory you must go to their special directory page.. Their regular search results are not fetched from the directory, but from their search engine.
Directories are very useful when you have no more than a general notion of what you are looking for.
The first page normally gives you the most general categories (like "Computers and Internet" or "Education"). Click your way down the hierarchy to the right category, select the website you find the most interesting and start reading.
If you use the search form when exploring a directory, remember that you are not searching the text of the actual webpages of a particular site. Instead you are searching the text contained in the site title and the description of the site. These are composed by the directory editors, and are often based on suggestions from the site owners themselves.
In addition most directories will also search the words contained in the category titles or descriptions.
Note, however, that some search directories may add data from regular search engines if they cannot find matches to your query. We will tell you more about this below.
Search engines are -- well -- "engines" or "robots" that crawl the Web looking for new webpages. These robots read the webpages and put the text (or parts of the text) into a large database or index that you may access. None of them cover the whole Net, but some of them are quite large.
The major players in this field are Google, Yahoo! Search (which is not the same as the Yahoo! Directory), MSN Search (Now known as bing) and, DuckDuckGo and Ask.
Search engines should be your first choice when you know exactly what you are looking for. They also cover a much larger part of the Web than the directories.
There are also "metasearch" services which search several search engines and directories at the same time, trying to extract the most relevant hits from all of them.
You might find it useful to start your searching with one of these, just to get a general feeling for what is out there. The search syntax is problematic, however.
It may vary from search engine to search engine, which means that the metasearch engine has to try to "translate" your query into a language that each search engine will understand. More often than not, they will not try to do so.
For more complex searches, you should go directly to the relevant search engine. Also note that the metasearch engines will give you but a small part of the results from each individual search engine.
The best search services
Search engines now indexes billions of webpages
One thing remains true, however: The search engines do not all cover the same parts of the Internet Universe, which gives you every reason to use more than one of them.
For metasearching we recommend Yippy and Ixquick.
However, do try the other search services as well! Some of them may be perfect for your needs.
Now, let's go to the next part of the search tutorial!
Advanced Web searching -- as easy as ordering pizza
You're hungry. You go into a restaurant, sit down by the table and wait for the waiter. The waiter arrives, coughs politely, and asks: "What do you want for dinner, sir?" or "What would you like today, madam?" "Food," you answer, "food".
Fortunately, waiters are an understanding and patient lot. "Certainly, sir. What kind of food did you have in mind? May I recommend the salmon?"
Your average search engine is not that understanding. A search for food brings up several million webpages. Those million pages are just too many to stomach. And, no, the search engine does not try to find out what you're really looking for.
Still, a lot of Internet searchers actually ask questions like these: "sport", "books", "news".
So, what do you do? You refine your question. You become more specific. You provide more information.
"I would like a pizza with pepperoni and ham, but with no olives and no garlic."
Here's the good news: If you are able to order a pizza like that, you are able to use advanced "Boolean" searching on the Internet. It's actually that easy
Boolean searching -- the operators AND, AND NOT, OR
You have asked for pizza with pepperoni and ham, but without olives and garlic. Here's how your order will look using
pizza AND pepperoni AND ham AND NOT olives AND NOT garlic.
A search engine would interpret this Boolean expression in the following way:
"The user wants me to show him or her links to all the pages that include the word pizza as well as the word pepperoni and the word ham, but he or she wants me to subtract pages that include the word olives or the word garlic.
It isn't poetry, but it is logical and it works. The operator AND means that the word that follows has to be in the text of the pages that are to be listed. Pages including the words following AND NOT will not be listed.
If you suspect that the restaurant is out of pepperoni, you may be a little more open-minded about this, and say: "I would like pepperoni or chicken". In Boolean terms that is:
pepperoni OR chicken
On the Net an order like this one will give you all the pages that include the word pepperoni, all the pages that include the word chicken and all the pages that include both of these words.
What happens if you take out the operators AND, AND NOT and OR and write the following line instead?
pizza pepperoni ham olives garlic
Most search engines interpret the space between the words as AND. That is, they will give you all the pages that include all these word. But that was not what you were looking for, was it? You are interested in pages that do not include the word olives or garlic, not in pages that have to include these words.
Then again, some engines may interpret the space between the words as OR. This means that they will even give you pages that include only one of these words. You will, for instance, end up with a lot of irrelevant information about the garlic industry.
At the moment true Boolean searching is supported by most of the major search engines.
If you ask search engines for a pan pizza they may not only give you pages on pizza and pan pizza, but also information about the god Pan, Pan flutes, frying pans, Peter Pan, Pan Arabian co-operation and more.
You need a way of telling the search engine that pan pizza is an expression or a phrase. For this you use double quotation marks: "...", like this:
"pan pizza" AND "Italian pepperoni" AND "black olives"
This will tell the search engine to look for pages that include the text string pan pizza, not the word pan in general.
Proximity: the NEAR-operator
What if you are looking for a sequence of words that are normally connected, but that may be split by other words? If you were looking for information on the inventor Thomas Alva Edison, you could possibly search for a phrase, like this:
"Thomas Alva Edison"
But this search would not bring you pages where the name is given as Thomas A. Edison or Thomas Edison. You could solve this problem by entering
"Thomas Alva Edison" OR "Thomas A. Edison" OR "Thomas Edison"
or you could use the NEAR search operator. NEAR means "show me pages where these words are near each other".
Thomas NEAR Edison
How near is NEAR? That depends. In AltaVista the words used to be less than 10 words apart.
dogs near/3 cats
finds documents in which dog and cat occur within three words of each other, in either order."
By altering the number, you can decide how far apart the keyword can be in order to be included in the results.
Until Yahoo! took over the company, AltaVista Advanced Search allowed use of this operator. After it started using the Yahoo! search engine for its results AltaVista no longer supports NEAR.
The only regular web search site allowing this operator until recently was -- interestingly enough -- AOL's version of Google: AOL Search.
However, there is a relatively new European search engine on the block that do support quite a few advanced search features, including NEAR and that is Exalead.
Please note that some search engines and directories are partially case sensitive. If you spell a word or a phrase with lower case letters in the search form, the engine will match both upper and lower case letters on the webpage.
Searches for "apple computer" will give you pages with apple computer, Apple Computer and even APPLE COMPUTER. It is normally not the other way round. A search for "Bill Gates" will give you Bill Gates but not bill gates.
As you can see, this might be useful when you are looking for persons. By using capital letters in "Bill Gates", you avoid pages including the words bill (meaning invoice) and gates (meaning portals) only.
Search engines may get confused. What does the following search imply, really?
"pan pizza" AND pepperoni OR ham AND olives
The use of parentheses -- nesting -- will clear things up:
"pan pizza" AND (pepperoni OR ham) AND olives
This means that you want a pizza with olives, but are uncertain whether you want pepperoni or ham on that pizza.
On the other hand:
("pan pizza" AND pepperoni) OR (ham AND olives)
means that you have to choose between a pepperoni pan pizza and a dish based on ham and olives.
Now you know the basics. Some engines use the expression NOT instead of AND NOT, but if you stick to AND NOT it should work anyway. Write AND NOT in two words. The only exception is Pandia Plus and the Open Directory, actually. Here you have to write ANDNOT in one word. ANDNO, don't ask us why!
Some search engines want you to write the Boolean operators in CAPITAL letters. The rest will ignore the difference between upper and lower case. If you use capital letters you are on the safe side.
Truncation or wildcards*
The English language gives you many variations of the same word: dog and dogs, give and giving. Many expressions are combination of several words: doghouse. You may be looking for some of these combinations at the same time, normally the singular and plural form of the same noun.
In most search engines and directories, a search for
will give you pages with all words starting with the three letters dog, including dog, dogs, dogged, doggy and dogma. As you can see, if you were looking for dog and dogs, you will be picking up some unwanted hits. Truncation or wildcards works best when the stem is longer and if the stem is not a root of many other common words.
Please note that a lot of search engines "stem" keywords, i.e. they will automatically search for dog if you enter the keyword "dogs" and vice versa.
Note also that Google has introduced a special "tilde"-operator that lets you search for synonyms. If you place the tilde sign ("~") immediately in front of a keyword, Google will replace that keyword with a list of words with a similar meaning, thus extending your search.
For instance: to search for food facts as well as nutrition and cooking information, enter the following query: ~food ~facts.
Search engine math -- the easier way
Now, if you find Boolean operators too intimidating, there is an easier way. This is called simplified search syntax, pseudo-Boolean searching, implied Boolean or (according to Danny Sullivan of Search Engine Watch) "search engine math".
It goes like this:
+pizza +pepperoni +ham -olives -garlic.
Put a plus sign in front of words that must be present on the webpage. A minus sign in front of a word will tell the search engine to subtract pages that contain that particular word. Hence + equals the Boolean search term AND, and - the term AND NOT.
In most search engines you can combine the pluses and the minuses with quotation marks, as explained above. However, you cannot use brackets or the OR-operator.
Here is one example:
+"pan pizza" -olives pepperoni
This means that the pages the search engine shows you must include the phrase pan pizza, they must not include the word olives, and they should preferably include the word pepperoni.
If there is no sign in front of a word, most search engines will nevertheless read a + sign. The engine reckons that the word should be present . In other words: it will default to AND if it finds no "mathematical signs".
The use of the minus sign may have some unforeseen consequences. Imagine that you are looking for webpages that contain information about the Star Wars movie, The Phantom Menace. You would like to avoid pages on earlier movies in order to reduce the number of hits:
+"Star Wars" +"The Phantom Menace" -"A New Hope" -"Return of the Jedi" -"The Empire Strikes Back"
All the earlier movies in the series are marked with a minus, meaning that pages that include these phrases should not be included in the "hit list". This means, however, that the search engine will subtract all the pages that include these phrases, including pages that have information on all the movies -- A New Hope as well as The Phantom Menace.
The information you are looking for may obviously be on one of those pages. Hence you should use the minus sign (or the AND NOT term for that matter) with great care.
Please note that there must not be any space between the relevant sign and the word! Write +"Star Wars", not + " Star Wars ".
Avoid using a "-" term as the first one in your query. Write dog -cat, not -cat dog
When the search engine robots retrieve information from webpages around the world, they sort the information into various categories or "fields". The main fields that can be accessed in field searching are:
Title: This is the text you can read in the bar at the top of the browser window (not the main headline on the webpage itself). The title normally contains important keywords referring to the content of the page. If you restrict your search to the page titles, you will get fewer -- but more focused -- hits. You could for instance search for petunias AND title:gardening.
URL: This is the address (the Uniform Resource Locator) of a page, e.g. http://www.pandia.com/. You may restrict you search to pages with addresses that contain a certain word. If you want to restrict your search to the Pandia tutorial, you can do a search like this: "field searching" AND url:pandia.com/goalgetter.
Domains: The domain is the unique name that identifies an Internet site. Domain Names have two or more parts, separated by dots. The part on the left is the most specific, and the part on the right is the most general. Cf. yahoo.com and pandia.com. The domain name is normally part of the Web and email address.
Some search engines allow you to restrict your search to a specific domain. By doing a domain-search you may for instance restrict your search to pages in a specific country. British pages normally end in the letters .uk. A search for Jaguar AND car AND domain:.uk should give you British pages containing information on the Jaguar car.
There are also some top level domains (com, org, net etc.) that are not restricted to specific countries, although they are predominantly American. You can use these endings to restrict your search to commercial (.com), US educational (.edu), US governmental (.gov) or US military (.mil) sites.
OK. You find an interesting site in your favorite directory. You click on the relevant link, and -- alas -- get an error code!
If you get the message "Document not found" when trying to open a webpage, do not despair. The message confirms that the site exists, and the webpage may still be there. If you look at a Web address like this one: http://www.pandia.com/search/faq.html, you will see that it looks very much like a file address on a PC or a MAC (cf. C:\documents\letter.doc or harddrive:documents:letter.doc).
As a matter of fact, an HTTP-address is a file address. http:// tells your browser to look for a webpage; www.pandia.com tells it to look for a server or computer called www.pandia.com; /search/ tells it to look for the directory (or folder) called "search"; and the last part tells it to open a file called faq.html that should be in that directory.
However, there is no directory called "search" on this server. You have been given an incorrect or out dated address. There may be a file with information about faq.html higher up in the file hierarchy, though.
So here's what you do: Delete the last part of the address until you come to the next "/". Then you are left with http://www.pandia.com/. Then hit "enter" and see what you get. If an address ends with a slash (/), you are not specifying what file the browser should look for.
Following the rules of the Internet, however, the browser will then look for a file that is defined as "default" by the server (normally called index.html or default.html). The main webpage or index in any directory is most often named -- you guessed it -- index.html. And there it is, http://www.pandia.com/index.html has a link to the Pandia FAQ.
The server does not have a DNS entry
If your browser is unable to locate the server (the computer containing the webpage) this could mean that the server does not exist any more. However, it could also be that the server is down for maintenance or that the network is busy. In any case: Try again later.
If you have typed the address (URL), do check the spelling!
If everything fails, and you get the same error message the next day, you could visit Archive.org, a search engine that keep copies of the indexed webpages on their servers. You may find an old version of the file you are looking for there.
Menu based Web searching
If you discuss advanced Internet searching with search engine officials, they will probably tell you that most searchers are not interested in learning true Boolean searching, and that they prefer menu based search options. This may be so, but then again most searchers do not know what they are missing.
We find menu based search facilities to be more confusing than Boolean searching, and they are not as flexible when it comes to building more complex queries. That being said, menu based pages for advanced searching may be quite efficient, as soon as you get a grip on how they work.
(If you do not know a search form from a web address field or a pull down menu from a radio button, please read the absolute beginners text box below first.)
A menu based search page will include one (or more) search forms where you enter your search query. The simplest versions will give you one form to enter all your keywords, and a pull down menu that gives various options regarding how these keywords are to be treated by the search engine.
Normally these options are:
- All these words, meaning that the search engine is to fetch pages that have all these words on them (equals Boolean AND or +)
- Any words or One of these words, meaning that the search engine is to fetch pages that have at least one of these words, but not necessarily all, on them (equals Boolean OR)
- This exact phrase, meaning that the search engine is to find pages that include these words in this particular order. When using Boolean searching or search engine math you would enclose the words in double quotation marks (“-“)
This type of pull down menus do not give you the opportunity to exclude words (Boolean AND NOT). However, there are some search engines that let you distribute your search terms over several search fields, where each of them has its own pull down menu with options signifying whether this word or these words
- have to be included on the page (Boolean AND)
- may be included on the page (Boolean OR)
- must not be included on the page (Boolean AND NOT)
See for instance Any search engine's advanced search page.
By filling in all the fields you can actually build quite complex queries.
It helps to picture each of these forms as separate filters or sieves, one put beneath the other, and each of them filtering out and discarding a certain number of web pages. The search engine pours in all the web pages available and sorts out the pages you do not need on the basis of these filters.
For instance, if you tell the search engine that the pages that are to be fetched have to include the word “agriculture”, it will normally filter out all pages that do not include this word (Google makes exceptions to this rule, but that should not concern us here).
Most menu based pages for advanced searching also provides other types of filters, predominantly for various forms of “field searching”. For instance, you may limit the search to Web pages that have been made within a certain time period, i.e. you ask the search engine to filter out pages that do not belong to this period.
You may also select pages written in a particular language, thus excluding all other languages, or you may look for pages belonging to a certain site (pandia.com) or a certain type of domain (for instance .edu for American educational sites or .no for Norwegian sites), thus sorting out all pages that do not belong to this site or domain.
Each “filter” you apply will narrow down your search and return fewer results. You will normally have to experiment to get the optimal results – too many filters and you end up sorting out useful and relevant pages, too few and you end up with too many hits.
An example of a menu search form.
Important: This is an archived, mini and a bit revised version of Pandia's Goal Getter section. Although this site mainly offers webmaster product comparisons now, we have made a very few of the pages available which are still most sought by visitors from earlier versions of the website.