Tuesday, February 19, 2008

Boolean Searching On the Internet

Boolean Searching on the Internet

This post is about the principles of search logic and the different manifestations of this logic on Web search engines

The Internet is a vast computer database. As such, its contents must be searched according to the rules of computer database searching. Much database searching is based on the principles of Boolean logic. Boolean logic refers to the logical relationship among search terms, and is named for the British-born Irish mathematician George Boole.

On Internet search engines, the options for constructing logical relationships among search terms extend beyond the traditional practice of Boolean searching. This will be covered in the section below, Boolean Searching on the Internet.

Boolean logic consists of three logical operators:

  • OR
  • AND
  • NOT

Each operator can be visually described by using Venn diagrams, as shown below.


OR

Venn diagram for OR

college OR university

Query: I would like information about college.

  • In this search, we will retrieve records in which AT LEAST ONE of the search terms is present. We are searching on the terms college and also university since documents containing either of these words might be relevant.
  • This is illustrated by:
  • the shaded circle with the word college representing all the records that contain the word "college"
  • the shaded circle with the word university representing all the records that contain the word "university"
  • the shaded overlap area representing all the records that contain both "college" and "university"

OR logic is most commonly used to search for synonymous terms or concepts.

Here is an example of how OR logic works:

Search terms

Results

college

396,482

university

590,791

college OR university

819,214

OR logic collates the results to retrieve all the unique records containing one term, the other, or both.

The more terms or concepts we combine in a search with OR logic, the more records we will retrieve.

Venn diagram for OR

For example:

Search terms

Results

college

396,482

university

590,791

college OR university

819,214

college OR university OR campus

929,677


AND

Venn diagram for AND

poverty AND crime

Query: I'm interested in the relationship between poverty and crime.

  • In this search, we retrieve records in which BOTH of the search terms are present
  • This is illustrated by the shaded area overlapping the two circles representing all the records that contain both the word "poverty" and the word "crime"
  • Notice how we do not retrieve any records with only "poverty" or only "crime"

Here is an example of how AND logic works:

Search terms

Results

poverty

76,342

crime

348,252

poverty AND crime

12,998

The more terms or concepts we combine in a search with AND logic, the fewer records we will retrieve.

Venn diagram for AND

For example:

Search terms

Results

poverty

76,342

crime

348,252

poverty AND crime

12,998

poverty AND crime AND gender

1,220

A few Internet search engines make use of the proximity operator NEAR. A proximity operator determines the closeness of terms within the text of a source document. NEAR is a restrictive AND. The closeness of the search terms is determined by the particular search engine. Google defaults to proximity searching by default.


NOT

Venn diagram for NOT

cats NOT dogs

Query: I want information about cats, but I want to avoid anything about dogs.

  • In this search, we retrieve records in which ONLY ONE of the terms is present
  • This is illustrated by the shaded area with the word cats representing all the records containing the word "cats"
  • No records are retrieved in which the word "dogs" appears, even if the word "cats" appears there too

Here is an example of how NOT logic works:

Search terms

Results

cats

86,747

dogs

130,424

cats NOT dogs

65,223

NOT logic excludes records from your search results. Be careful when you use NOT: the term you do want may be present in an important way in documents that also contain the word you wish to avoid.

Boolean Searching on the Internet

When you use an Internet search engine, the use of Boolean logic may be manifested in three distinct ways:

  1. Full Boolean logic with the use of the logical operators
  2. Implied Boolean logic with keyword searching
  3. Predetermined language in a user fill-in template

1. Full Boolean logic with the use of the logical operators

Few search engines nowadays offer the option to do full Boolean searching with the use of the Boolean logical operators. It is more common for them to offer simpler methods of constructing search statements, specifically implied Boolean logic and template language. These methods are covered below.

If you want to construct search queries using Boolean logical opeartors, you will need to experiment with search engines and see what happens when you search. You can try some of the search statements shown below.

Examples:

Query: I need information about cats.

Boolean logic: OR

Search: cats OR felines

Query: I'm interested in dyslexia in adults.

Boolean logic: AND

Search: dyslexia AND adults

Query: I'm interested in radiation, but not nuclear radiation.

Boolean logic: NOT

Search: radiation NOT nuclear

Query: I want to learn about cat behavior.

Boolean logic: OR, AND

Search: (cats OR felines) AND behavior

Note: Use of parentheses in this search is known as forcing the order of processing. In this case, we surround the OR words with parentheses so that the search engine will process the two related terms first. Next, the search engine will combine this result with the last part of the search that involves the second concept. Using this method, we are assured that the semantically-related OR terms are kept together as a logical unit.

2. Implied Boolean logic with keyword searching

Keyword searching refers to a search type in which you enter terms representing the concepts you wish to retrieve. Boolean operators are not used.

Implied Boolean logic refers to a search in which symbols are used to represent Boolean logical operators. In this type of search on the Internet, the absence of a symbol is also significant, as the space between keywords defaults to either OR logic or AND logic. Nowadays, most search engines default to AND.

Implied Boolean logic has become so common in Web searching that it may be considered a de facto standard.

Examples:

Query: I need information about cats.

Boolean logic: OR

Search: [None]

It is extremely rare for a search engine to interpret the space between keywords as the Boolean OR. Rather, the space between keywords is interpreted as AND. To do an OR search, choose either option #1 above (full Boolean logic) or option #3 below (user fill-in template).

Query: I'm interested in dyslexia in adults.

Boolean logic: AND

Search: +dyslexia +adults

Query: I'm interested in radiation, but not nuclear radiation.

Boolean logic: NOT

Search: radiation -nuclear

Query: I want to learn about cat behavior.

Boolean logic: OR, AND

Search: [none]

Since this query involves an OR search, it cannot be done with keyword searching. To conduct this type of search, choose either option #1 above (full Boolean logic) or option #3 below (user fill-in template).

3. Predetermined language in a user fill-in template

Some search engines offer a search template which allows the user to choose the Boolean operator from a menu. Usually the logical operator is expressed with substitute language rather than with the operator itself.

Examples:

Query: I need information about cats

Boolean logic: OR

Search: Any of these words/Can contain the words/Should contain the words

Query: I'm interested in dyslexia in adults.

Boolean logic: AND

Search: All of these words/Must contain the words

Query: I'm interested in radiation, but not nuclear radiation.

Boolean logic: NOT

Search: Must not contain the words/Should not contain the words

Query: I want to learn about cat behavior.

Boolean logic: OR, AND

Search: Combine options as above if the template allows multiple search statements

Quick Comparison Chart:
Full Boolean vs. Implied Boolean vs. Templates


Full Boolean

Implied Boolean

Template Terminology

OR

college or university

[rarely available]
*see note below

any of these words
can contain the words
should contain the words

AND

poverty and crime

+poverty +crime

all of these words
must contain the words

NOT

cats not dogs

cats -dogs

must not contain the words
should not contain the words

NEAR, etc.

cats near dogs

N/A

near

* Most multi-term search statements will resolve to AND logic at search engines that use AND as the default. Nowadays most search engines default to AND. Always play it safe, however, and consult the Help files at each site to find out which logic is the default.

Where to Search:
A Selected List

Feature

Search Engine

Boolean operators

Dogpile | Google [OR only] | Ixquick

Full Boolean logic with parentheses, e.g.,
behavior and (cats or felines)

AlltheWeb Advanced Search | AltaVista Advanced Web Search | Ixquick | Live Search

Implied Boolean +/-

Most search engines offer this option

Boolean logic
using search form terminology

Most advanced search options offer this, including:
AllTheWeb Advanced Search | AltaVista Advanced Web Search AOL Advanced Search | Ask.com Advanced Search | Google Advanced Search | Yahoo Advanced Web Search

Proximity operators

Exalead | Google [by default] | Ixquick

0 comments: