Short Course: Searching the Web with Altavista
by Denny Curtin

I. INTRODUCTIONII. TUTORIALIII. ALTAVISTA REFERENCE
IV. PRACTICING SIMPLE QUERIESV. SELF-TEST

Notice

This material may be printed out by individuals for personal use. It may not be copied or distributed in any way without the express permission of Tramline, Inc. Any individuals or organizations seeking authorization to make multiple copies or to otherwise use this material beyond individual personal use should contact info@tramline.com.

I. INTRODUCTION

Surfing the Web by clicking links to jump from page to page is fun but no way to do serious research. To find things quickly, you need to use a search engine like AltaVista. In this guide, you explore how to use the powerful search engine to locate the information you want. To make things interesting, our theme will be Yosemite, one of America's most beautiful national parks. We'll see how much information we can find about this area, especially information on the people such as Carleton Watkins and Ansel Adams who photographed it and showed its beauty to the rest of the world.

Yosemite
View from Artist's Point (click image for larger view)

Before we begin, here's a brief chronology of Yosemite to familiarize you with the area's history.

  • 14,000 years ago, glaciers carved the valley.
  • 1851 the first recorded visit of nonnative Americans who proposed the name "Yosemity."
  • 1859 the first known photographs taken by C.L. Weed.
  • 1861 Carleton Watkins begins photographing Yosemite
  • 1868 John Muir makes first visit.
  • 1872 Eadward Muybridge begins photographing with large glass-plate negatives.
  • 1890 Yosemite National Park created by an Act of Congress.
  • 1892 Sierra Club organized.
  • 1916 Ansel Adams begins photographing with a Brownie camera on a family vacation.

Return to Top

II. TUTORIAL

In this tutorial, you are guided step by step through the techniques of successful searching. Before you begin, you might want to browse the reference section that contains more detailed explanations of the search techniques you'll be using.

This tutorial deals only with AltaVista simple queries for pages on the Web. A separate tutorial covers advanced queries. However the term "simple" is misleading. Even AltaVista has this to say about its simple and advanced searches "Advanced search is for very specific searches and not for general searching. Almost everything you need to search for can be found quickly and with better results using the standard search box, where the AltaVista search services sorts the results by placing the most relevant content first. However, if you need to find documents within a certain range of dates or if you have to do some complex Boolean searches there isn’t a more powerful tool on the Web."

As you complete this tutorial, you'll learn how to:

  • List pages that contain any word or phrase you specify
  • Use upper- and lowercase letters in words and phrases
  • Use wildcards in words
  • Listing only those pages that contain all of the words or phrases you specify
  • Eliminate sites that contain specific words or phrases
  • Search for images
  • Find sites linked to another site
  • Locate URLs containing specific words

1. Searching for a Word

To begin, use your browser to go to the AltaVista site at http://www.altavista.com/. The AltaVista screen changes all of the time, but for searching the key element is the Search box. When you type a word or phrase (called a query) into this box and then click the Search button, AltaVista lists all of the pages on the Web containing the words you entered. It ranks pages so those with the most matches are listed first. Let's see what we can find on "Yosemite."

Search Box
The Search box

Search: To begin, click in the Search box, type Yosemite and click AltaVista's Search button.

Write down the number of pages AltaVista has found: ________________

Result: The results are listed and there are a lot of them. Notice how the Results page has a number of elements:

  • AltaVista knows the answer to this question is a new feature that tries to lure you off your search.
  • AltaVista found 256570 Web pages for you (your number will be different because it's a dynamic index). The number here is a good indicator of how precise your search has been. Obviously, a quarter-million listings isn't very useful.
  • Numbered listings with links to sites is the real section of interest. Although there are a lot of pages containing the word you searched for, AltaVista tries to rank them in order of interest. Check out the first few pages to see how relevant they are.
Listing
Numbered Listing

Each listing has a title that is underlined to indicate it's a link. When you point to it, the mouse pointer will change to a pointing finger and the URL of the page you will jump to if you click it is listed on the bottom of your browser. Below the title is a description of the page, its URL, and finally the date it was last modified.

2. Exploring Case

Now, let's see if the case of characters has any affect on our search. In the last query, you searched for Yosemite with an uppercase letter "Y." This time let's use only lowercase letters in the query and search for yosemite.

Search: Click in the Search box and select the current entry or delete it. Type yosemite and then click AltaVista's Search button.

Write down the number of pages AltaVista has found: ________________

Result: You should get slightly more hits when you use all lowercase. When you use uppercase, AltaVista will only list pages where the searched for word(s) have the exact same case. However, when you use lowercase, it will list pages in any case. Here the difference is minor because you've searched for a place name. However, you'll find a big difference in some situations. It's usually better to use lowercase at first so you don't miss possibilities.

3.Using Wildcards

The asterisk is a wildcard that stands for between zero and five lowercase letters. One of its main functions is to be sure you get all versions of a word including singular, plural, and possessive.

Search: Click in the Search box to select the current entry or delete it. Type Yosemite* and then click AltaVista's Search button.

Write down the number of pages AltaVista has found: ________________

Result: Pages that have both plurals and possessives such as Yosemite's beauty are now listed.

4. Using Multiple Words

To expand a search, you can enter more than one word in your query. Documents that contain either word will be listed although those that contain all of the words will be at the top of the list. Let's see what we can find about the photographer Carleton Watkins and his photographs of Yosemite.

Search: Enter the query Carleton Watkins Yosemite and click AltaVista's Search button.

Write down the number of pages AltaVista has found: ________________

Result: This search turns up any pages with one or more of the words, and there are a lot of them. It's called an OR query because it tells the computer to find any page containing Carleton OR Watkins OR Yosemite. A page only has to have one of the three words to be listed.

5. Searching for Phrases

The words Carleton and Watkins in the previous query are a photographer's name. Let's treat them as a phrase to see what effect that has.

Search: Enter the query "Carleton Watkins" Yosemite and click AltaVista's Search button.

Write down the number of pages AltaVista has found: ________________

Result: Now the number of hits drops because "Carleton Watkins" is no longer two words. Pages have to have the full name, or the word Yosemite to be listed. Documents with just Carleton or Watkins won't be listed.

6. Forcing Matches

Up until now, pages have been listed when they contain any of the specified words or phrases. Let's use the plus sign (+) to limit the matches to those pages that contain both the photographer's name and Yosemite.

Search: Enter the query +"Carleton Watkins" +Yosemite and click AltaVista's Search button.

Write down the number of pages AltaVista has found: ________________

Result: Only pages that contain both the name and the place are listed. This is called an AND query because to be listed, pages must contain both "Carleton Watkins" AND Yosemite.

7. Preventing Matches

To prevent matches and eliminate some pages while displaying others, use a minus (-) sign in front of the word or phrase you don't want in the listed pages. Let's see how many references there are to Carleton Watkins that don't also refer to Yosemite.

Search: Enter the query +"Carleton Watkins" -Yosemite and click AltaVista's Search button.

Write down the number of pages AltaVista has found: ________________

Result: Only pages containing the photographer's name but not the place are listed. This is called a NOT query because to be listed pages must contain "Carleton Watkins" but NOT Yosemite.

8. Finding Images

The image: keyword is used to find images on the Web. Let's see if we can find any of Yosemite. We'll use a wildcard in the search so we find images in any format.

Search: Enter the query image:yosemite.* and click AltaVista's Search button.

Write down the number of pages AltaVista has found: ________________

Result: Any graphic file is listed that is named yosemite and with any extension (GIF, JPEG, etc.).

9. Checking Linked Sites

When you find a site you like, it's easy to find other sites that like it too. You do this using the link: keyword that gives you a list of all sites that have established links to the site you're curious about. Let's see what sites have linked to the yosemite.org site.

Search: Enter the query link:yosemite.org and click AltaVista's Search button.

Write down the number of pages AltaVista has found: ________________

Result: All pages containing links to yosemite.org are listed. These pages probably have content related to Yosemite since they linked to a Yosemite site.

10. Checking URLs

There may be sites that have a domain name or folder containing the word yosemite. Let's use the url: keyword to find out.

Search: Enter the query url:yosemite and click AltaVista's Search button.

Write down the number of pages AltaVista has found: ________________

Result: Any pages with yosemite anywhere in their URL are listed. Since an entire page has been given this name these pages probably have content related to Yosemite and not just a passing reference to the name.

11. Listing Pages on a Host

To get a list of pages on a host computer, you use the host: keyword. Let's use it to see what's on the yosemite.org computer.

Search: Enter the query host:yosemite.org and click AltaVista's Search button.

Write down the number of pages AltaVista has found: ________________

Result: All of the pages on the yosemite.org site are listed. This is like a table of contents of the site and makes it easier to locate specific information without having to browse through the entire site.

Return to Top

 

III. ALTAVISTA REFERENCE

There are many search engines on the Web that help you zero in on information you want to find and AltaVista is one of the most powerful and most popular. AltaVista is continually searching the Web for new pages that it adds to its index. It has indexed billions of words from millions of Web pages. When you enter a word or phrase in the Search box and then click the Search button, AltaVista searches its index for Web pages containing those words and displays a list of them on the Results page. Any pages that it finds are called matches or hits because at least one word on the page matches one of the words in your query. (Query is just a computer word for question.) AltaVista ranks the Web pages it finds using a set of rules (called an algorithm) and assigns a score to each page. Those pages with a higher score are listed first. When entering queries, it's important to phrase your queries so the documents you want get higher scores. Higher scores are obtained when the words or phrases in your query:

  • appear in the first few words of the document, perhaps in the document’s title.
  • are located close to one another in the document.
  • appear more than once in the document.

Getting good at searching the Web takes some practice.

  • Use the results of one search as a guide to the next one.
  • If you get a match that is what you are looking for, see if the page contains unique words or word patterns that might guide you in a more refined search to locate other pages.

If you're curious about how AltaVista lets you search the Web, you might be surprised to learn that it uses a type of program called a spider (AltaVista named theirs Scooter). This program tirelessly wanders from link to link on the Web day and night. When it finds a new or updated page, it sends the entire page back to headquarters. There, other software takes all of the meaningful words in the document and lists them in an index along with the address of the page they are from.

1. The AltaVista Screen Display

When you connect to the AltaVista Web site, its screen display contains a Search box and lots of other information that changes continually.

The Search Box

To look for pages on the Web, you first try to think of some rare or unique words that might appear in a page you are looking for. The more unusual the words, the more likely you'll find what you're looking for. You enter these words in the Search box and then click AltaVista's Search button.

The Results page

When you execute a query, any pages that contain your words are listed in the order AltaVista thinks is most relevant. Above the list of matches is a summary of the search that indicates how many times each of your search words was listed in the AltaVista index. It then lists the number of Web pages in its index that contain those words.

The Listed Pages

Below the summary area on the screen is a list of all of the Web pages that match your query. The title of each listed page is a hyperlink you can click it to display the actual page. (Note that the page's title may not actually appear on the document itself. It is a name assigned to the document by its author.) The description that follows is taken from the first few lines of the document. This is followed by the actual URL, the size of the page, and the date it was last edited. If more than one URL is listed for a site it means the pages differ in some respect.

The order in which pages are listed on the Results page is determined by a ranking algorithm. Each listed document is given a grade based on how many of your search terms it contains, where those words are in the document, and how close to each other they are.

The Page Number List

When the list of sites is too long to be displayed on a single page, a list of Results pages is displayed at the bottom of the screen. You will probably have to scroll to see it. Clicking one of the page numbers displays the sites listed on that page. You can also click [next >>] and [<< prev] to scroll through the pages. Different colors are used to indicate the page you are currently on, pages you have visited already, and pages you haven't yet visited.

2. Searching for Words

Knowing what to search for and how to phrase your query are basic skills you should acquire so you can quickly zero into the information you want.

Searching for words such as digital or printer can give thousands or tens of thousands of matches. Searching for Where gives over 6-million matches, far too many to be useful. On the other hand, searching for the wrong word form might give too few. For example, searching for microprocessor (the singular) won't find microprocessors (the plural). When entering queries, it helps to know that AltaVista considers a word to be any series of letters and digits that begin and end with:

  • white space
  • non-alphabetic characters such as & % $ / # _ ~
  • spaces, tabs, line ends, start of document, end of document

When you enter more than one word, AltaVista treats these as OR queries and lists documents containing any of the words. However, it does place at the top of the list those pages that contain all of the words.

Examples

  • Library of Congress finds documents containing Library of Congress. Only some of the documents will refer to the Library of Congress.
  • Watkins Yosemite will list documents containing the names Watkins OR Yosemite. Only some of the documents will refer to Yosemite.

Here are some tips to improve your results.

  • Start with just one word in your query, and then slowly add others, examining the results of each search. Generally, the more words you use, the more matches you'll get.
  • If you get too few matches, check your spelling. Searching for the misspelled word micorprocessor won't get any hits unless someone else has also misspelled the word in the same way.
  • Search for synonyms instead of the original words. For example, instead of searching for chip search for microprocessor, or even "Pentium Pro".

3. Understanding Case and Accents

AltaVista is case-sensitive so the case you use to enter search words is important.

UPPERCASE — Uppercase — lowercase

Uppercase and lowercase letters
  • Entering a word in lowercase finds words in any case. For example, searching for buffalo in a query will match buffalo, Buffalo, BuFFalo, or BUFFALO.
  • Using an uppercase letter forces AltaVista to find only exact matches for the word. For example, the capitalized word Buffalo in a query will only match Buffalo in the document, and ignore other capitalization variants.
  • Often words are in one case when they are the first word in a sentence, or in a title or heading and in another case when they fall elsewhere in the document. When running your first search, it's best to use all lowercase so you find all occurrences of the word or phrase. You should use uppercase only when you want to force a match to an exact spelling.
  • Accents are treated in the same way as capitalization. An accented word in a query forces an exact match of the entire word. For example, searching for résumé will not find resume. To find all occurrences, don't use accents and use all lowercase. For example, searching for resume will find both résumé and resume.

4. Using Wildcards

To find all forms of a word, such as words in both singular and plural, possessives, or words with a similar pattern, use the asterisk (*). This is a wildcard that will match between zero and five lowercase letters (not numbers) that occur at its position in the string.

Examples

  • librar* finds library, library's, and libraries
  • microprocessor* finds microprocessor, microprocessors, and even microprocessor's

TIP: You can also use the asterisk in the middle of a word, provided it is preceded by at least three characters (otherwise it will find way too many matches). If there are too many matches, AltaVista ignores the query. Uppercase letters and numbers will not be matched.

5. Searching for Phrases

Searching for phrases instead of words can dramatically narrow your search because matches only occur when a Web page has the same words in the same order. A phrase is a series of adjacent words separated by white space or punctuation. To search for phrases, enclose them in quotes. For example, searching for "Library of Congress" finds only documents with the complete phrase. (You can also use punctuation to glue words together into phrases, but it isn't recommended.) Since punctuation is treated as white space, Carleton;Watkins is the same query as "Carleton Watkins".

Examples

  • "Ansel Adams"
  • "Eadward Muybridge"
  • "Carleton Watkins"
  • "Glacier Point"
  • "Half Dome"

When AltaVista indexes pages on the Web, it ignores punctuation marks and white space except to indicate where words begin and end. For this reason, you can’t search for punctuation or white space but you can use either to join words in a phrase so it's treated as a unit. For example, searching for "Library of Congress", Library/of/Congress, or Library-of-Congress give the same results. (Don't use the asterisk for this purpose since it has a special meaning.)

Note that because of the way AltaVista handles punctuation, strings such as AT&T or yosemite.com are treated as two words joined together in a phrase.

6. Forcing and Preventing Matches

When you use two or more words or phrases in a query, documents that contain any of the words are listed. Some of the documents will not contain all of the words. To ensure that only documents with a specific word are listed, put a plus sign in front of the word in your query. Be sure not to require too many such words or phrases because you may eliminate documents that would be of interest.

+ specifies that the document must contain the word

Many queries display long lists of matches that are of no use. If they have a word or phrase in common, such as a company name, you can prevent them from being listed. To do so, place a minus sign in front of the word or phrase.

- specifies that the document must not contain the word

Examples

"Carleton Watkins" Yosemite will list documents containing the name Carleton Watkins OR Yosemite.

+"Carleton Watkins" +Yosemite will only list documents containing both Carleton Watkins AND Yosemite.

+"Carleton Watkins" Yosemite will find documents that contain Carleton Watkins and may or may not contain Yosemite.

+"Carleton Watkins" -Yosemite will only list documents containing the name Carleton Watkins and NOT Yosemite.

7. Finding Specific Things on Web Pages

To limit your search to the structured parts of a document, you use a keyword (in lowercase), a colon, and then the word or phrase you are searching for.

Searching for Titles and Hyperlinks

  • title:"The Yosemite Observer" matches pages with the phrase The Yosemite Observer in the title. The title this searches for isn't displayed on the page, it's a name that has been assigned to the page by the author. The page's title is what's displayed on the browser's title bar when you are viewing the page and the first item in the list of pages AltaVista displays.
  • anchor:"Yosemite by Ansel Adams" matches pages with the visible phrase Yosemite by Ansel Adams in the text of a hyperlink. The anchor is the part of the link that's visible on the page and highlighted.
  • link:yosemite.org displays pages that contain the specified link, for example, at least one link to a page with yosemite.org in its URL. The link isn't visible on the page, you see it on the browser's status bar when you point to an anchor in your browser.

Searching for Text and Images

  • text:yosemite matches pages that contain the word yosemite in any part of the visible text of a page.
  • image:yosemite.* lists pages with yosemite followed by any extension in an image tag.

Searching for URLs

  • url:yosemite looks at all parts of the page's URL, in this case for yosemite. It will find www.yosemite.org and www.parks.gov/yosemite.htm.
  • host:yosemite looks only at the domain name part of the page's URL, in this case yosemite and lists all of the pages on that site. It will find www.yosemite.org and www.yosemite.org.
  • domain:yosemite looks only at the topmost level of the page's domain name, in this case for yosemite. Folders with the same name won't be listed so your search will be significantly narrowed. There are currently only a few domain names including .gov, com, .org, .edu, .net, and country codes although the list is being expanded.

Searching for Applets and Objects

  • applet:NervousText matches pages containing the name of the Java applet class found in an applet tag; in this case, NervousText.
  • object:Marquee matches pages containing the name of the ActiveX object found in an object tag; in this case, Marquee.

8. Why Can't I Locate A Page?

There are times when you can't seem to locate a page, even when you know what's on it. Generally this is because the page hasn't yet been indexed. However, it may be listed but the summary is too meaningless for you to recognize it. If the page isn't listed, there could be a number of reasons:

  • The document is on a computer behind a gateway or firewall.
  • AltaVista may not have been able to follow a link to the page because to so you have to fill out a form or take some other action that AltaVista can't do.
  • The page's author has requested that it not be indexed by robots or spiders.
  • The page may be an island on the Web with no other sites pointing to it. The only way it would be indexed is if someone sent its URL to AltaVista
  • AltaVista may not have been able to reach the page because the computer it's on was out of service or congested.
  • The page may have been renamed or removed by the owner since it was last indexed.

Return to Top

 

IV. PRACTICING SIMPLE QUERIES

There's nothing like a little practice to hone your search techniques. Go to the AltaVista site and try locating information on the following topics.

  • How was Yosemite carved out by glaciers?
  • What was the role of the Mariposa Battalion?
  • What did Dr. Lafayette H. Brunnell do?

Locations to learn about

  • Glacier Point
  • Bridalveil Falls
  • El Capitan
  • Yosemite Falls
  • Half Dome
  • The Ahwahnee Hotel
  • Tuolumne Meadows
  • Merced River and Lake
  • Tioga Pass
  • Vernal Falls
  • Mariposa Grove

Return to Top

 

V. SELF-TEST

Once you have finished this tutorial and read the reference section, you might test your understanding before you leave. Just answer these questions.

1. What is the plus sign (+) used for?

2. What is the minus sign (-) used for?

3. How do you indicate a phrase?

4. What documents are displayed when you use two words such as Vernal Falls?

5. What documents are displayed when you use a phrase such as "Vernal Falls"?

6. What keyword do you use to look for images?

7. What's the major difference between the domain: and url: keywords.

8. What would be the effect of entering Buffalo instead of buffalo in your query?

9. What would you enter to be sure you found glacier, glaciers, and glacier's.

10. What pages would be listed by the query host:yosemite.com?

Return to Top

 

 

 

 

VFT Home | About | Field Trip List | Lounge | Standards

the software | the field trips | the book | the training

Home | About Tramline | Support | Store | Contact

Send mail to info@tramline.com with questions or comments about this Web site.
Copyright © 1996-2003 Tramline. All rights reserved. Legal Agreement.