Difference between revisions of "CISC220 F2023 Lab10"

Latest revision as of 09:25, 5 December 2023

Lab #10

1. Hash tables

Starter code for an abstract class String_HT and two derived classes Chain_String_HT and Probe_String_HT is provided here. Several dictionaries ranging in size from 100 to ~84K English words, as well as text files for testing spellchecking, are also supplied along with the code.

The executable is called as follows: hashcheck <command filename>. Each command file initializes a hash table from a dictionary on the first line:

CHAIN filename: Count words in filename, compute appropriately-size hash table, call constructor of Chain_String_HT class which uses separate chaining for collision resolution, and insert every word into that hash table. Chains are implemented with STL vector class. Only output is printing the dictionary hash table's WORD_COUNT TABLE_SIZE followed by a newline
PROBE filename: Same as previous command but call constructor for Probe_String_HT class which uses quadratic probing for collision resolution. This scheme should use lazy deletion and follow the probing sequence f(1) = +1, f(2) = -4, f(3) = +9, f(4) = -16, and so on.

The initialization line is followed by 0 or more 1-argument "commands" (one per line) which will trigger calls to member functions of the hash table object:

INSERT word: Insert word into hash table following appropriate collision scheme. Update word count and table size variables as necessary. If the word is already in the table, don't insert a duplicate. If the new load factor would be >= MAX_LOAD then re-hash by calling expand_and_rehash(). Once again, only output (after the insertion and bookkeeping) WORD_COUNT TABLE_SIZE
REMOVE word: Remove word from hash table if it is there. Again output (after the removal and bookkeeping) WORD_COUNT TABLE_SIZE
FIND word: If word is found in hash table, do and print nothing. If word is NOT found, print WORD NUM_COLLISIONS
SPELLCHECK filename: Find every word in the file filename and store all NOT found words and the associated number of collisions in the badwords STL map. Then print size of badwords on its own line and iterate through badwords, printing every WORD NUM_COLLISIONS pair

2. Programming tasks

All of the file reading and command parsing described above is handled for you.

The Chain_String_HT constructor and print() functions are written for you. You will have to write the constructor for Probe_String_HT, but print() is optional -- you might find it useful for debugging.

These functions should be written by you in both Chain_String_HT and Probe_String_HT and will be directly tested. The Chain version and the Probe version of each function will be worth half of the points below.

[1 points] void insert(string)
[1.5 points] void remove(string)
[1 point] bool find(string, int &) (the second argument is num_collisions, allowing you to report back how many iterations collision resolution required)
[1.5 points] expand_and_rehash()

You may work with a (human) partner on this lab. AI is allowed for the expand_and_rehash() function only

3. Submission

Submit 2 files to Gradescope: (1) your README and (2) your modified main.cpp. The README should contain your name, complete declarations of AI use, notes on any limitations or issues with your code, and your interpretation of any ambiguities in the assignment instructions. main.cpp should also contain your name and per-function comments on AI usage.

@@ Line 8: / Line 8: @@
 The executable is called as follows: <tt>hashcheck <command filename></tt>.  Each command file initializes a hash table from a dictionary on the first line:
-* <tt>CHAIN filename</tt>: Count words in <tt>filename</tt>, compute appropriately-size hash table, call constructor of <tt>Chain_String_HT</tt> class which uses ''separate chaining'' for collision resolution, and insert every word into that hash table.  Only output is printing <tt>WORD_COUNT TABLE_SIZE</tt>
+* <tt>CHAIN filename</tt>: Count words in <tt>filename</tt>, compute appropriately-size hash table, call constructor of <tt>Chain_String_HT</tt> class which uses ''separate chaining'' for collision resolution, and insert every word into that hash table.  Chains are implemented with STL <tt>vector</tt> class.  Only output is printing the dictionary hash table's <tt>WORD_COUNT TABLE_SIZE</tt> followed by a newline
-* <tt>PROBE filename</tt>: Same as previous command
+* <tt>PROBE filename</tt>: Same as previous command but call constructor for <tt>Probe_String_HT</tt> class which uses ''quadratic probing'' for collision resolution.  This scheme should use lazy deletion and follow the probing sequence ''f(1) = +1'', ''f(2) = -4'', ''f(3) = +9'', ''f(4) = -16'', and so on.
 The initialization line is followed by 0 or more 1-argument "commands" (one per line) which will trigger calls to member functions of the hash table object:
-* <tt>INSERT word/tt>: Using DFS, print "CONNECTED" if <tt>w1</tt> and <tt>w2</tt> are connected, "NOT CONNECTED" otherwise
+* <tt>INSERT word</tt>: Insert <tt>word</tt> into hash table following appropriate collision scheme.  Update word count and table size variables as necessary.  If the <tt>word</tt> is already in the table, don't insert a duplicate.  If the new load factor would be <tt>>= MAX_LOAD</tt> then re-hash by calling <tt>expand_and_rehash()</tt>.  Once again, only output (after the insertion and bookkeeping) <tt>WORD_COUNT TABLE_SIZE</tt>
-* <tt>REMOVE word</tt>: Using DFS, print number of words that are connected to <tt>w</tt>.  <tt>w</tt> itself counts as 1
+* <tt>REMOVE word</tt>: Remove <tt>word</tt> from hash table if it is there.  Again output (after the removal and bookkeeping) <tt>WORD_COUNT TABLE_SIZE</tt>
-* <tt>FIND word</tt>: Using BFS, print length of shortest path connecting <tt>w1</tt> and <tt>w2</tt>.  If they are not connected, print "NOT CONNECTED"
+* <tt>FIND word</tt>: If <tt>word</tt> is found in hash table, do and print nothing.  If <tt>word</tt> is NOT found, print <tt>WORD NUM_COLLISIONS</tt>
-* <tt>SPELLCHECK filename</tt>:  Using BFS, print sequence of words on that path (one per line starting with <tt>w1</tt> and ending with <tt>w2</tt>).  If they are not connected, print "NOT CONNECTED"
+* <tt>SPELLCHECK filename</tt>:  Find every word in the file <tt>filename</tt> and store all NOT found words and the associated number of collisions in the <tt>badwords</tt> STL <tt>map</tt>.  Then print size of <tt>badwords</tt> on its own line and iterate through <tt>badwords</tt>, printing every <tt>WORD NUM_COLLISIONS</tt> pair
 ===2. Programming tasks===
-These are core <tt>String_HT</tt> functions that are required but not directly tested.  Since they don't generate output but have side effects on the data structures above, every print function should either call one of these or use the modified data structure(s).
+All of the file reading and command parsing described above is handled for you.
-* <tt>calculate_neighbor_words(string &)</tt>
+The <tt>Chain_String_HT</tt> constructor and <tt>print()</tt> functions are written for you.  You will have to write the constructor for <tt>Probe_String_HT</tt>, but <tt>print()</tt> is optional -- you might find it useful for debugging.
-* <tt>DFS_traversal(string &)</tt>
-* <tt>BFS_traversal(string &)</tt>
-These <tt>String_HT</tt> functions will be directly tested and correspond one-to-one with the commands listed above:
+These functions should be written by you in both <tt>Chain_String_HT</tt> and <tt>Probe_String_HT</tt> and will be directly tested.  The <tt>Chain</tt> version and the <tt>Probe</tt> version of each function will be worth half of the points below.
-* '''[0.5 points]''' <tt>print_num_neighbors(string &)</tt>
+* '''[1 points]''' <tt>void insert(string)</tt>
-* '''[0.5 points]''' <tt>print_neighbors(string &)</tt>
+* '''[1.5 points]''' <tt>void remove(string)</tt>
-* '''[1 point]''' <tt>DFS_print_connected(string &, string &)</tt>
+* '''[1 point]''' <tt>bool find(string, int &)</tt> (the second argument is <tt>num_collisions</tt>, allowing you to report back how many iterations collision resolution required)
-* '''[1 point]''' <tt>DFS_print_num_connected(string &)</tt>
+* '''[1.5 points]''' <tt>expand_and_rehash()</tt>
-* '''[0.5 points]''' <tt>BFS_print_path_length(string &, string &)</tt>
-* '''[1 point]''' <tt>BFS_print_path(string &, string &)</tt>
-* '''[0.5 points]''' <tt>BFS_print_longest_path()</tt>
-'''You may use AI on any part of this lab, but no human partner.'''
+'''You may work with a (human) partner on this lab.  AI is allowed for the <tt>expand_and_rehash()</tt> function only'''
 ===3. Submission===
 Submit 2 files to Gradescope: (1) your README and (2) your modified <tt>main.cpp</tt>.  The README should contain your name, complete declarations of AI use, notes on any limitations or issues with your code, and your interpretation of any ambiguities in the assignment instructions. <tt>main.cpp</tt> should also contain your name and per-function comments on AI usage.

Difference between revisions of "CISC220 F2023 Lab10"

Latest revision as of 09:25, 5 December 2023

Contents

Lab #10

1. Hash tables

2. Programming tasks

3. Submission

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools