Thursday, February 21, 2008

Google

http://www.google.com/

Use Google when...

  • you are looking for a specific fact/person/event/narrow topic
  • your topic is made up of multiple ideas
  • you can't get enough of Google's link ranking of results
  • you like Google's specialized features such as spell checking, phone book and flight lookups, stock prices, etc.
  • you want to take advantage of Google's advanced search interface that lets you fill out a form to do a search targeted to your needs

Google is a general search engine that is everyone's favorite these days. It ranks results by the number of links from the largest number of pages also ranked high by the service. The more highly ranked pages that link to a certain page, the higher the linked-to page will be ranked by Google. This unique ranking system can be quite effective.

Special Features:

  • Returns results ranked by the number of links from a high number of pages ranked high by the service; high ranking pages are also determined by the number of links to them
  • In determining relevancy ranking, the engine also looks at various textual clues including linking text
  • Suggests an alternative search when search terms are misspelled.
  • Search results include sites from the Open Directory Project, offering an interesting mix of sites from the wider Web and those chosen by editors for inclusion into the directory. See also Google's own version, the Google Web Directory.
  • OR searching is supported if "OR" is typed in CAPS, e.g., university OR college; works only with multiple single words
  • For more refined searches, use quotations for phrases ("El Nino") or a minus sign (-) for the Boolean NOT
  • I'm feeling lucky option returns the top-ranked source for a query
  • Offers searching of Web pages in a number of languages; and the Google site can be set to display only the tips and instructions in a different language
  • Has a number of special search features, listed on its page about Google Web Search Features
  • Searches the deep Web for such information as:
  • Files in Portable Document Format, Microsoft Word, Excel, and PowerPoint, Rich Text Format and PostScript
  • Images, from the Advanced Web Search interface or from Google Image Search
  • Maps from Yahoo! or MapBlast (enter an address)
  • Phone book entry (enter first and last name, and city or zip)
  • Stock prices (enter a comapny's ticker symbol)
  • And lots more!

Drawbacks:

  • New Web pages will not appear in your results, as it takes time for the creators of other Web pages to link to new resources, and for this activity to be reflected at Google
  • Google results can be manipulated ("bombed") by people who maintain Web pages. Bloggers and others sometimes attempt to associate words with a link to a specific Web site to make a political or other point. For example, the search terms "miserable failure" point to the official George Bush site. This site may become the number one hit on Google, even though the words are not relevant to it.
  • A number of other issues are pointed out by librarian Gary Price in his piece, A Couple of Comments about Google

We will be using Google to learn a number of search techniques.

Exercise: Multiple concept search

Query: I'd like to learn more about Richard Nixon's resignation.

Search:

  1. Type: Nixon resignation [Google defaults to Boolean AND logic]
  2. Examine results for relevancy
  3. Note the related categories from the Google Web Directory listed at the top of the results screen

This is a good example of a search tool that defaults to AND logic. It wouldn't hurt to use the plus (+) sign in front of each term, +Nixon +resignation, but this is not necessary. However, if you want a common word such as "where" or "with," you should use a plus (+) sign, e.g., +where.

Exercise: Phrase Search

For another way to ensure that all your search terms appear in documents you retrieve, use phrase searching. Enclosing a phrase within quotation marks is a syntax that works on nearly all search engines on the Web.

Query: I'd like to see information on the movie Gone with the Wind.

Search: "Gone with the Wind" [capitalization is not necessary]

Exercise: Field Search

Field searching is a way to narrow your search to specific parts of the document or record. Google offers a variety of ways to use field searching to better focus your results. First, let's try a simple search that is not a field search.

Query: I'd like to see information on slavery.

Search: slavery

This is isn't the wisest search to do in a large, full-text database like Google because it brings back too many results.

Let's look into ways to focus our results by using field searching. We will try these searches using Google's basic search box. Keep in mind that most of Google's field search options are also available on their Advanced Search form that is even easier to use.

Search: intitle:slavery

This is a much better search. This search will look for slavery in the title field embedded within the HTML document. Notice how all the page titles contain the word slavery.

Search (c): inurl:slavery

This is also a good alternative search. This search will look for slavery in the URL of the file, e.g., in a subdirectory named slavery, or in a filename such as slavery.html. Notice how all the results contain the word slavery somewhere in the URL.

Exercise: Putting it all together: Phrase and Field Search

Query: I'd like information about the Mars rover missions from the NASA site.

Search: +"Mars rover" +site:nasa.gov

This is a nicely-focused search. It uses the plus (+) sign to be sure that all of our search terms appear on the retrieved documents. In addition, the phrase Mars Rover is enclosed in quotation marks, and we have narrowed our search to retrieve documents only from the NASA site.

If field searching appeals to you, Google offers a complete list of Advanced Search Operators that you can examine and try. Also, be sure to check out Google's Advanced Search page. There are many useful options there and filling out the form is easy. Remember: a focused search is more likely to bring you the results that you're looking for.

Yahoo Directory

http://dir.yahoo.com/

Use the Yahoo Directory when...

  • you are using the Web for relaxation or personal use
  • you want to browse through subject categories to see what is available on your topic
  • you are willing to use a collection of generally unevaluated material

Note that the Yahoo site consists of several components. A Yahoo search defaults to a search of the general Web, using its own search technology. This page discusses the Yahoo Directory portion of the Yahoo site.

Special Features of the Yahoo Directory:

  • Is one of the largest subject directories on the Web
  • Has broad subject coverage
  • Has a hierarchical subject organization that is good for browsing

Drawbacks:

  • Accepts almost any site submitted for inclusion to its database and does not do much to evaluate for quality or accuracy
  • Makes no attempt to be balanced in any subject area; it is mostly the passive recipient of sites submitted to it
  • The Yahoo editors are so busy that they don't have time to include all submitted sites into the directory; therefore, significant and high quality sites may never make it onto Yahoo
  • There is generally sporadic coverage of academic subjects
  • Tends to index only the major landing page of a site; therefore, any significant subsidiary pages on a related or different topic may not show up on this site

Quick Tip!

The Yahoo Directory is NOT an appropriate research tool.

Most sites listed in the Yahoo directory are submitted by users, not by editors who are searching the Web for valuable content. Most annotations are written by the site creators and may therefore not be objective. Yahoo explicitly makes no guarantee that its editors check for quality or accuracy.

If you wish to use a wide-ranging directory portal, try the Open Directory Project. This directory is maintained by thousands of volunteer editors and links to nearly 2 million resources on the Web. The quality of its content is generally more reliable than the content on Yahoo.

An excellent example of the Open Directory is the version maintained by the Google search engine, called the Google Web Directory. This version utilizes Google's Page Rank technology and ranks results by link popularity.

Don Busca

http://www.donbusca.com/

Use Don Busca when...

  • You want the convenience of a meta search engine that searches multiple sources simultaneously
  • Your topic is made up of multiple concepts
  • Your topic is somewhat obscure so a search across multiple sources might help

Special Features:

  • Simultaneously searches several major search engines and subject directories
  • With each result, also offers a cached version, a link to the site archive in the Wayback Machine, site info from a variety of sources and various bookmark management options

Exercise: Meta searching

Query: Does violence on televsion have an effect children?

Search: +violence +television +children

BUBL

http://bubl.ac.uk/link/

Use BUBL when...

  • you are researching a relatively broad topic
  • you want to browse through subject categories to see what is available on your topic
  • you want to retrieve a small number of substantive results - in this case, usually between 5 and 15 of the most relevant resources on any given topic
  • you want to retrieve results that have been selected by information professionals

BUBL LINK is a significant, professionally-maintained directory that has been around for years. Begun as a volunteer librarian effort, it was a UK funded project hosted by the University of Strathclyde Library in Glasgow, Scotland. Since this funding ended, BUBL has been maintained by staff at the Centre for Digital Library Research at the University of Strathclyde. Its many years of experience are apparent in the breadth of its listings, useful indexing, variety of access points and cogent, well-written annotations. Its main interface, BUBL LINK/5:15 offers between 5 and 15 relevant sources for most subjects.

Special Features:

  • Offers strong coverage in academic subject areas
  • Has wide coverage within topics
  • 5:15 interface makes it easy to find highly relevant resources broken down into targeted subject categories for accurate information finding
  • Listings may be accessed by: Subject, A-Z, Dewey Decimal Classification, and Types, e.g., biographies, essays, image collections, directories
  • Coverage is noticeably worldwide
  • Has a user-friendly search form
  • Updates to the directory are listed each month

Let's explore BUBL LINK to get an idea of its coverage.

Exercise: exploring a professional directory

  1. Select the Science and Mathematics category
  2. On the next screen, notice the large number of subtopics from which to choose
  3. Select a topic that interests you, and keep selecting you retrieve a list of recommended sites
  4. On the final results screen, notice the link index on the left side of the screen. On the right, you'll see that each listing includes a title, annotation, author and Dewey class.

Here is a list of other recommended subject directories with evaluated content. It is highly recommended that they become a part of your research repertoire.

INFOMINE: Scholarly Internet Resource Collections

http://infomine.ucr.edu/
INFOMINE is a large collection of scholarly Internet resources collectively maintained by libraries of the University of California and offering many interesting search and retrieval options

Librarians' Internet Index

http://www.lii.org/
The LII is a well-organized, selective, and continually updated collection maintained by a large number of indexers in California.

Intute

http://www.intute.ac.uk/
This collection from Great Britain is a gateway to large academic-oriented collections in major subject disciplines.

Brainboost

http://www.brainboost.com/

Use Brainboost when...

  • you are looking for the answer to a fact-based question that usually has one correct answer
  • you want to ask your question in plain English

Special Features:

  • Accepts plain English queries, so there are no searching rules to learn
  • Tends to work best with queries that have precise and factual answers
  • Offers a set of related questions that also retrieve results

Drawbacks:

  • Not all questions may answered with equal accuracy

Exercise: Plain English query

Query: Where can I learn to use a mouse?

Search: Where can I learn to use a mouse?

Examine the results. Note that Brainboost did a good job of distinguishing the concept of mouse as an animal and mouse as a computer tool based on the way we asked the question. If you ask What is a mouse? you will get dictionary definitions that cover different uses of this word.

Tuesday, February 19, 2008

Boolean Searching On the Internet

Boolean Searching on the Internet

This post is about the principles of search logic and the different manifestations of this logic on Web search engines

The Internet is a vast computer database. As such, its contents must be searched according to the rules of computer database searching. Much database searching is based on the principles of Boolean logic. Boolean logic refers to the logical relationship among search terms, and is named for the British-born Irish mathematician George Boole.

On Internet search engines, the options for constructing logical relationships among search terms extend beyond the traditional practice of Boolean searching. This will be covered in the section below, Boolean Searching on the Internet.

Boolean logic consists of three logical operators:

  • OR
  • AND
  • NOT

Each operator can be visually described by using Venn diagrams, as shown below.


OR

Venn diagram for OR

college OR university

Query: I would like information about college.

  • In this search, we will retrieve records in which AT LEAST ONE of the search terms is present. We are searching on the terms college and also university since documents containing either of these words might be relevant.
  • This is illustrated by:
  • the shaded circle with the word college representing all the records that contain the word "college"
  • the shaded circle with the word university representing all the records that contain the word "university"
  • the shaded overlap area representing all the records that contain both "college" and "university"

OR logic is most commonly used to search for synonymous terms or concepts.

Here is an example of how OR logic works:

Search terms

Results

college

396,482

university

590,791

college OR university

819,214

OR logic collates the results to retrieve all the unique records containing one term, the other, or both.

The more terms or concepts we combine in a search with OR logic, the more records we will retrieve.

Venn diagram for OR

For example:

Search terms

Results

college

396,482

university

590,791

college OR university

819,214

college OR university OR campus

929,677


AND

Venn diagram for AND

poverty AND crime

Query: I'm interested in the relationship between poverty and crime.

  • In this search, we retrieve records in which BOTH of the search terms are present
  • This is illustrated by the shaded area overlapping the two circles representing all the records that contain both the word "poverty" and the word "crime"
  • Notice how we do not retrieve any records with only "poverty" or only "crime"

Here is an example of how AND logic works:

Search terms

Results

poverty

76,342

crime

348,252

poverty AND crime

12,998

The more terms or concepts we combine in a search with AND logic, the fewer records we will retrieve.

Venn diagram for AND

For example:

Search terms

Results

poverty

76,342

crime

348,252

poverty AND crime

12,998

poverty AND crime AND gender

1,220

A few Internet search engines make use of the proximity operator NEAR. A proximity operator determines the closeness of terms within the text of a source document. NEAR is a restrictive AND. The closeness of the search terms is determined by the particular search engine. Google defaults to proximity searching by default.


NOT

Venn diagram for NOT

cats NOT dogs

Query: I want information about cats, but I want to avoid anything about dogs.

  • In this search, we retrieve records in which ONLY ONE of the terms is present
  • This is illustrated by the shaded area with the word cats representing all the records containing the word "cats"
  • No records are retrieved in which the word "dogs" appears, even if the word "cats" appears there too

Here is an example of how NOT logic works:

Search terms

Results

cats

86,747

dogs

130,424

cats NOT dogs

65,223

NOT logic excludes records from your search results. Be careful when you use NOT: the term you do want may be present in an important way in documents that also contain the word you wish to avoid.

Boolean Searching on the Internet

When you use an Internet search engine, the use of Boolean logic may be manifested in three distinct ways:

  1. Full Boolean logic with the use of the logical operators
  2. Implied Boolean logic with keyword searching
  3. Predetermined language in a user fill-in template

1. Full Boolean logic with the use of the logical operators

Few search engines nowadays offer the option to do full Boolean searching with the use of the Boolean logical operators. It is more common for them to offer simpler methods of constructing search statements, specifically implied Boolean logic and template language. These methods are covered below.

If you want to construct search queries using Boolean logical opeartors, you will need to experiment with search engines and see what happens when you search. You can try some of the search statements shown below.

Examples:

Query: I need information about cats.

Boolean logic: OR

Search: cats OR felines

Query: I'm interested in dyslexia in adults.

Boolean logic: AND

Search: dyslexia AND adults

Query: I'm interested in radiation, but not nuclear radiation.

Boolean logic: NOT

Search: radiation NOT nuclear

Query: I want to learn about cat behavior.

Boolean logic: OR, AND

Search: (cats OR felines) AND behavior

Note: Use of parentheses in this search is known as forcing the order of processing. In this case, we surround the OR words with parentheses so that the search engine will process the two related terms first. Next, the search engine will combine this result with the last part of the search that involves the second concept. Using this method, we are assured that the semantically-related OR terms are kept together as a logical unit.

2. Implied Boolean logic with keyword searching

Keyword searching refers to a search type in which you enter terms representing the concepts you wish to retrieve. Boolean operators are not used.

Implied Boolean logic refers to a search in which symbols are used to represent Boolean logical operators. In this type of search on the Internet, the absence of a symbol is also significant, as the space between keywords defaults to either OR logic or AND logic. Nowadays, most search engines default to AND.

Implied Boolean logic has become so common in Web searching that it may be considered a de facto standard.

Examples:

Query: I need information about cats.

Boolean logic: OR

Search: [None]

It is extremely rare for a search engine to interpret the space between keywords as the Boolean OR. Rather, the space between keywords is interpreted as AND. To do an OR search, choose either option #1 above (full Boolean logic) or option #3 below (user fill-in template).

Query: I'm interested in dyslexia in adults.

Boolean logic: AND

Search: +dyslexia +adults

Query: I'm interested in radiation, but not nuclear radiation.

Boolean logic: NOT

Search: radiation -nuclear

Query: I want to learn about cat behavior.

Boolean logic: OR, AND

Search: [none]

Since this query involves an OR search, it cannot be done with keyword searching. To conduct this type of search, choose either option #1 above (full Boolean logic) or option #3 below (user fill-in template).

3. Predetermined language in a user fill-in template

Some search engines offer a search template which allows the user to choose the Boolean operator from a menu. Usually the logical operator is expressed with substitute language rather than with the operator itself.

Examples:

Query: I need information about cats

Boolean logic: OR

Search: Any of these words/Can contain the words/Should contain the words

Query: I'm interested in dyslexia in adults.

Boolean logic: AND

Search: All of these words/Must contain the words

Query: I'm interested in radiation, but not nuclear radiation.

Boolean logic: NOT

Search: Must not contain the words/Should not contain the words

Query: I want to learn about cat behavior.

Boolean logic: OR, AND

Search: Combine options as above if the template allows multiple search statements

Quick Comparison Chart:
Full Boolean vs. Implied Boolean vs. Templates


Full Boolean

Implied Boolean

Template Terminology

OR

college or university

[rarely available]
*see note below

any of these words
can contain the words
should contain the words

AND

poverty and crime

+poverty +crime

all of these words
must contain the words

NOT

cats not dogs

cats -dogs

must not contain the words
should not contain the words

NEAR, etc.

cats near dogs

N/A

near

* Most multi-term search statements will resolve to AND logic at search engines that use AND as the default. Nowadays most search engines default to AND. Always play it safe, however, and consult the Help files at each site to find out which logic is the default.

Where to Search:
A Selected List

Feature

Search Engine

Boolean operators

Dogpile | Google [OR only] | Ixquick

Full Boolean logic with parentheses, e.g.,
behavior and (cats or felines)

AlltheWeb Advanced Search | AltaVista Advanced Web Search | Ixquick | Live Search

Implied Boolean +/-

Most search engines offer this option

Boolean logic
using search form terminology

Most advanced search options offer this, including:
AllTheWeb Advanced Search | AltaVista Advanced Web Search AOL Advanced Search | Ask.com Advanced Search | Google Advanced Search | Yahoo Advanced Web Search

Proximity operators

Exalead | Google [by default] | Ixquick

General Search Strategies


  • Most search engines employ the principles of Boolean logic in the formulation of search queries. See Boolean Searching on the Internet detailed information about search strategy. If you take the time to understand the basics of Boolean logic, you will have a better chance of search success.

  • Search engines tend to have a default Boolean logic. This means that the space between multiple search terms defaults to either OR logic or AND logic. This has become a de facto standard. It is imperative that you know which logical operator is the default. Nowadays, the default logic tends to be AND, but you should always check the site's Help file to make sure.

  • Another de facto standard is the requirement to search for phrases within quotations, e.g., "dealth penalty".

  • If the option is available, use proximity operators (e.g., NEAR) if these are available rather than specifying an AND relationship between your keywords. This will make sure that your search terms are located near each other in the full text document. The closer your terms are placed, the more possibly relevant the document will be. Google does proximity searching by default. See Boolean Searching on the Internet for a list of more sites that offer proximity searching.

  • Field searching is another extremely important way of limiting your search results in large search engines that contain millions of full-text files. For example,

TITLE:slavery

in a search engine such as AltaVista will bring you more relevant hits than merely searching on the keyword slavery.

  • To enhance subject searches, try the URL field to narrow your results. The URL field offers a good way to search for certain subject terms. This is because of the make-up of the URL.

Anatomy of a URL

This is a URL on the CNN home page

http://www.cnn.com/feedback/comments.html

This URL is typical of addresses hosted in domains in the United States. Structure of this URL:

  1. Protocol: http
  2. Host computer name: www
  3. Second-level domain name: cnn
  4. Top-level domain name: com
  5. Directory name: feedback
  6. File name: comments.html

The directory name and file name often contain subject terms. These can be searched with the URL field.

For example:

URL:slavery

will give you more relevant results than the keyword slavery by searching for this term as a directory name or a file name.

  • To find a home page when you know the location or sponsor of the information, use the SITE field. In this case, you search on the top-level and second-level domain names together, and then use AND logic to add subject terms to your search.

Examples of sites:

mit.edu
nasa.gov
microsoft.com

For example, if you are searching for information about spacewalks conducted by NASA, go to AltaVista and try something like this:

+site:nasa.gov +spacewalks

This search will limit your results to files at the NASA Web site.

  • Beware of searching on three-letter top-level domains to narrow your search. Do NOT try to search for the URL edu or com. There are too many pages in these domains for the search engine to handle. On the other hand, searching for the URL gov may be more successful because there are far fewer of these pages. Still, all searches on top-level domains should be used with caution.

Keep in mind that there are a few search services that specialize in retrieving Web pages from individual top-level domains. For example:

Use these specialty engines when you wish to limit your results to these domains, as your results are more likely to be accurate and comprehensive.

  • Limiting a search by a two-letter country code, also a top-level domain, might be a viable option. Take a look at this list of ISO 3166 Internet country codes.

    Quick Tip!

    Best Bet Search Syntax

    • Place the plus sign ( + ) in front of all words you wish to retrieve

    +hibernation +bears

    • Place a phrase within quotations

    "freedom of the press"
    Putting it all together:
    +"drug policy" +"United States"