Department of Computer Science.

Georgia State University

CSc 2311                                                                                                          J. L. Bhola

Spring 2008 - Assignment #5

Due April 10th 2008

Objectives:

1.     To gain experience with string objects.

2.     To gain experience with vectors/arrays.

3.     To gain experience with generic algorithms.

 

Description of program:

You are to write a class that finds anagrams (you can name the class anything). An anagram of a word is a permutation of the letters in that word; for example, “stop” is an anagram of “tops”.

Now write a driver program name anagram.cpp that invoke the above class to process the functions.

The input to the program is a list of words. The output is a list containing the same words, but with anagrams displayed on the same line.

 
The input file will be supplied via redirection and may look like:
 

Pans

Pots

opt

Sit

it’s

snap
 

and so on.
 

Execute the program by redirection, example:
 

C:\>anagram <words.dat
 

Pans snap

Pots

Sit it’s

opt
 

Here are some requirements for the program:

1.     When determining if two words are anagrams, the program must treat upper and lower case letters as equivalent (thus “Pans” and “snap” are anagrams) and ignore punctuation marks (“it’s” and “Sit” are anagrams). However, the program must display words with their original capitalization and punctuation – as shown above on the screen.

2.     The “word” is assumed to be any series of nonblank characters; words may be separated by any number of white-space characters. Any number of words may appear on a line, including none. You may assume that no word is more than 15 characters long. And maximum number of words in the file would be 50.

3.     The program must work even if the input file is empty. If this is the case print a message saying that “the input file is empty” and then terminate the program.

4.     The program must test the number of characters per word. If a word consists of more than 15 characters, the program should ignore that word and continue. That word would also be ignored in the total number of words of 50.

5.     The program must also test the number of words. If there are more than 50 words, print a message saying that there are more than 50 words in the input file and then terminate the program

 
Algorithm:

 
Use efficient algorithms. The first insight is to create a “signature” for each word by sorting the letters in the word.

 

Original Word                   Signature

Pans                                                                                          anps

Pots                                                                                         opts

opt                                                                                            opt

Sit                                                                                             ist

it’s                                                                                            ist

snap                                                                                          anps

 

Upper cases are converted to lower case and non-alphabetic characters are removed before the signature is computed. Punctuation marks are also ignored when computing signatures.

 

Creating signatures for words makes it easy to check whether two words are anagrams. However, we still have the problem of (apparently) having to compare every input word against every other input word. Actually, all we need to do is sort the lines by their signatures.

 

Original Word                   Signature

Pans                                                                                             anps

snap                                                                                             anps

Sit                                                                                                ist

it’s                                                                                               ist

Pots                                                                                            opts

opt                                                                                              opt

 

We next make a pass over from top to bottom, printing the original words. The words that have the same signature are anagrams, so we print them on the same line:

 

Pans snap

Pots

Sit it’s

opt

 

Implementation:

Use as many generic algorithms as possible so that the size of the program can be reduced.

Hint: You could use an array of objects to store both the words and the signatures.

 

 

 

What to turn in:

 

1.     A 3.5 inches floppy disk (IBM compatible) consisting of the .cpp and the  .exe files. MAKE SURE THAT YOUR DISK IS VIRUS CLEAN.

2.     A print out of each of the  .cpp file (source code) and .h (header file) if you used separate files.

3.     Make sure that your name is written clearly both on the print out and the disk.

4.     Make sure that both the .cpp and .exe files are on the root directory i.e. on the a:\  drive. Example:   
                                     a:\ anagram.cpp

                                     a:\ anagram.exe

 Should you use any subdirectory (whatsoever) your program would not be graded and you will receive a 0 (zero).