Difference between revisions of "CISC220 F2023 Lab10"

From class_wiki
Jump to: navigation, search
(1. Hash tables)
(1. Hash tables)
Line 15: Line 15:
 
* <tt>INSERT word</tt>: Insert <tt>word</tt> into hash table following appropriate collision scheme.  Update word count and table size variables as necessary.  If the <tt>word</tt> is already in the table, don't insert a duplicate.  If the new load factor would be <tt>>= MAX_LOAD</tt> then re-hash by calling <tt>expand_and_rehash()</tt>.  Once again, only output (after the insertion and bookkeeping) <tt>WORD_COUNT TABLE_SIZE</tt>
 
* <tt>INSERT word</tt>: Insert <tt>word</tt> into hash table following appropriate collision scheme.  Update word count and table size variables as necessary.  If the <tt>word</tt> is already in the table, don't insert a duplicate.  If the new load factor would be <tt>>= MAX_LOAD</tt> then re-hash by calling <tt>expand_and_rehash()</tt>.  Once again, only output (after the insertion and bookkeeping) <tt>WORD_COUNT TABLE_SIZE</tt>
 
* <tt>REMOVE word</tt>: Remove <tt>word</tt> from hash table if it is there.  Again output (after the removal and bookkeeping) <tt>WORD_COUNT TABLE_SIZE</tt>
 
* <tt>REMOVE word</tt>: Remove <tt>word</tt> from hash table if it is there.  Again output (after the removal and bookkeeping) <tt>WORD_COUNT TABLE_SIZE</tt>
* <tt>FIND word</tt>: Using BFS, print length of shortest path connecting <tt>w1</tt> and <tt>w2</tt>.  If they are not connected, print "NOT CONNECTED"
+
* <tt>FIND word</tt>: If <tt>word<tt> is found in hash table, do and print nothing.  If <tt>word</tt> is NOT found, print <tt>WORD NUM_COLLISIONS</tt>  
* <tt>SPELLCHECK filename</tt>:  Using BFS, print sequence of words on that path (one per line starting with <tt>w1</tt> and ending with <tt>w2</tt>).  If they are not connected, print "NOT CONNECTED"
+
* <tt>SPELLCHECK filename</tt>:  Find every word in the file <tt>filename</tt> and store all NOT found words and the associated number of collisions in the <tt>badwords</tt> map.  Then print size of <tt>badwords</tt> on its own line and iterate through <tt>badwords</tt>, printing every <tt>WORD NUM_COLLISIONS</tt> pair
  
 
===2. Programming tasks===
 
===2. Programming tasks===

Revision as of 15:04, 30 November 2023

Lab #10

1. Hash tables

Starter code for an abstract class String_HT and two derived classes Chain_String_HT and Probe_String_HT is provided here. Several dictionaries ranging in size from 100 to ~84K English words, as well as text files for testing spellchecking, are also supplied along with the code.

The executable is called as follows: hashcheck <command filename>. Each command file initializes a hash table from a dictionary on the first line:

  • CHAIN filename: Count words in filename, compute appropriately-size hash table, call constructor of Chain_String_HT class which uses separate chaining for collision resolution, and insert every word into that hash table. Chains are implemented with STL vector class. Only output is printing the dictionary hash table's WORD_COUNT TABLE_SIZE followed by a newline
  • PROBE filename: Same as previous command but call constructor for Probe_String_HT class which uses quadratic probing for collision resolution. This scheme should use lazy deletion and follow the probing sequence f(1) = +1, f(2) = -4, f(3) = +9, f(4) = -16, and so on.

The initialization line is followed by 0 or more 1-argument "commands" (one per line) which will trigger calls to member functions of the hash table object:

  • INSERT word: Insert word into hash table following appropriate collision scheme. Update word count and table size variables as necessary. If the word is already in the table, don't insert a duplicate. If the new load factor would be >= MAX_LOAD then re-hash by calling expand_and_rehash(). Once again, only output (after the insertion and bookkeeping) WORD_COUNT TABLE_SIZE
  • REMOVE word: Remove word from hash table if it is there. Again output (after the removal and bookkeeping) WORD_COUNT TABLE_SIZE
  • FIND word: If word<tt> is found in hash table, do and print nothing. If <tt>word is NOT found, print WORD NUM_COLLISIONS
  • SPELLCHECK filename: Find every word in the file filename and store all NOT found words and the associated number of collisions in the badwords map. Then print size of badwords on its own line and iterate through badwords, printing every WORD NUM_COLLISIONS pair

2. Programming tasks

These are core String_HT functions that are required but not directly tested. Since they don't generate output but have side effects on the data structures above, every print function should either call one of these or use the modified data structure(s).

  • calculate_neighbor_words(string &)
  • DFS_traversal(string &)
  • BFS_traversal(string &)

These String_HT functions will be directly tested and correspond one-to-one with the commands listed above:

  • [0.5 points] print_num_neighbors(string &)
  • [0.5 points] print_neighbors(string &)
  • [1 point] DFS_print_connected(string &, string &)
  • [1 point] DFS_print_num_connected(string &)
  • [0.5 points] BFS_print_path_length(string &, string &)
  • [1 point] BFS_print_path(string &, string &)
  • [0.5 points] BFS_print_longest_path()

You may use AI on any part of this lab, but no human partner.

3. Submission

Submit 2 files to Gradescope: (1) your README and (2) your modified main.cpp. The README should contain your name, complete declarations of AI use, notes on any limitations or issues with your code, and your interpretation of any ambiguities in the assignment instructions. main.cpp should also contain your name and per-function comments on AI usage.