Difference between revisions of "CISC220 F2023 Lab10"
(→1. Hash tables) |
(→1. Hash tables) |
||
(16 intermediate revisions by the same user not shown) | |||
Line 8: | Line 8: | ||
The executable is called as follows: <tt>hashcheck <command filename></tt>. Each command file initializes a hash table from a dictionary on the first line: | The executable is called as follows: <tt>hashcheck <command filename></tt>. Each command file initializes a hash table from a dictionary on the first line: | ||
− | * <tt>CHAIN filename</tt>: Count words in <tt>filename</tt>, compute appropriately-size hash table, call constructor of <tt>Chain_String_HT</tt> class which uses ''separate chaining'' for collision resolution, and insert every word into that hash table. Only output is printing <tt>WORD_COUNT TABLE_SIZE</tt> | + | * <tt>CHAIN filename</tt>: Count words in <tt>filename</tt>, compute appropriately-size hash table, call constructor of <tt>Chain_String_HT</tt> class which uses ''separate chaining'' for collision resolution, and insert every word into that hash table. Chains are implemented with STL <tt>vector</tt> class. Only output is printing the dictionary hash table's <tt>WORD_COUNT TABLE_SIZE</tt> followed by a newline |
− | * <tt>PROBE filename</tt>: Same as previous command | + | * <tt>PROBE filename</tt>: Same as previous command but call constructor for <tt>Probe_String_HT</tt> class which uses ''quadratic probing'' for collision resolution. This scheme should use lazy deletion and follow the probing sequence ''f(1) = +1'', ''f(2) = -4'', ''f(3) = +9'', ''f(4) = -16'', and so on. |
The initialization line is followed by 0 or more 1-argument "commands" (one per line) which will trigger calls to member functions of the hash table object: | The initialization line is followed by 0 or more 1-argument "commands" (one per line) which will trigger calls to member functions of the hash table object: | ||
− | * <tt>INSERT word/tt>: | + | * <tt>INSERT word</tt>: Insert <tt>word</tt> into hash table following appropriate collision scheme. Update word count and table size variables as necessary. If the <tt>word</tt> is already in the table, don't insert a duplicate. If the new load factor would be <tt>>= MAX_LOAD</tt> then re-hash by calling <tt>expand_and_rehash()</tt>. Once again, only output (after the insertion and bookkeeping) <tt>WORD_COUNT TABLE_SIZE</tt> |
− | * <tt>REMOVE word</tt>: | + | * <tt>REMOVE word</tt>: Remove <tt>word</tt> from hash table if it is there. Again output (after the removal and bookkeeping) <tt>WORD_COUNT TABLE_SIZE</tt> |
− | * <tt>FIND word</tt>: | + | * <tt>FIND word</tt>: If <tt>word</tt> is found in hash table, do and print nothing. If <tt>word</tt> is NOT found, print <tt>WORD NUM_COLLISIONS</tt> |
− | * <tt>SPELLCHECK filename</tt>: | + | * <tt>SPELLCHECK filename</tt>: Find every word in the file <tt>filename</tt> and store all NOT found words and the associated number of collisions in the <tt>badwords</tt> STL <tt>map</tt>. Then print size of <tt>badwords</tt> on its own line and iterate through <tt>badwords</tt>, printing every <tt>WORD NUM_COLLISIONS</tt> pair |
===2. Programming tasks=== | ===2. Programming tasks=== | ||
− | + | All of the file reading and command parsing described above is handled for you. | |
− | + | The <tt>Chain_String_HT</tt> constructor and <tt>print()</tt> functions are written for you. You will have to write the constructor for <tt>Probe_String_HT</tt>, but <tt>print()</tt> is optional -- you might find it useful for debugging. | |
− | |||
− | |||
− | These <tt> | + | These functions should be written by you in both <tt>Chain_String_HT</tt> and <tt>Probe_String_HT</tt> and will be directly tested. The <tt>Chain</tt> version and the <tt>Probe</tt> version of each function will be worth half of the points below. |
− | * '''[ | + | * '''[1 points]''' <tt>void insert(string)</tt> |
− | * '''[ | + | * '''[1.5 points]''' <tt>void remove(string)</tt> |
− | * '''[1 point]''' <tt> | + | * '''[1 point]''' <tt>bool find(string, int &)</tt> (the second argument is <tt>num_collisions</tt>, allowing you to report back how many iterations collision resolution required) |
− | + | * '''[1.5 points]''' <tt>expand_and_rehash()</tt> | |
− | |||
− | * '''[1 | ||
− | |||
− | '''You may | + | '''You may work with a (human) partner on this lab. AI is allowed for the <tt>expand_and_rehash()</tt> function only''' |
===3. Submission=== | ===3. Submission=== | ||
Submit 2 files to Gradescope: (1) your README and (2) your modified <tt>main.cpp</tt>. The README should contain your name, complete declarations of AI use, notes on any limitations or issues with your code, and your interpretation of any ambiguities in the assignment instructions. <tt>main.cpp</tt> should also contain your name and per-function comments on AI usage. | Submit 2 files to Gradescope: (1) your README and (2) your modified <tt>main.cpp</tt>. The README should contain your name, complete declarations of AI use, notes on any limitations or issues with your code, and your interpretation of any ambiguities in the assignment instructions. <tt>main.cpp</tt> should also contain your name and per-function comments on AI usage. |
Latest revision as of 10:25, 5 December 2023
Lab #10
1. Hash tables
Starter code for an abstract class String_HT and two derived classes Chain_String_HT and Probe_String_HT is provided here. Several dictionaries ranging in size from 100 to ~84K English words, as well as text files for testing spellchecking, are also supplied along with the code.
The executable is called as follows: hashcheck <command filename>. Each command file initializes a hash table from a dictionary on the first line:
- CHAIN filename: Count words in filename, compute appropriately-size hash table, call constructor of Chain_String_HT class which uses separate chaining for collision resolution, and insert every word into that hash table. Chains are implemented with STL vector class. Only output is printing the dictionary hash table's WORD_COUNT TABLE_SIZE followed by a newline
- PROBE filename: Same as previous command but call constructor for Probe_String_HT class which uses quadratic probing for collision resolution. This scheme should use lazy deletion and follow the probing sequence f(1) = +1, f(2) = -4, f(3) = +9, f(4) = -16, and so on.
The initialization line is followed by 0 or more 1-argument "commands" (one per line) which will trigger calls to member functions of the hash table object:
- INSERT word: Insert word into hash table following appropriate collision scheme. Update word count and table size variables as necessary. If the word is already in the table, don't insert a duplicate. If the new load factor would be >= MAX_LOAD then re-hash by calling expand_and_rehash(). Once again, only output (after the insertion and bookkeeping) WORD_COUNT TABLE_SIZE
- REMOVE word: Remove word from hash table if it is there. Again output (after the removal and bookkeeping) WORD_COUNT TABLE_SIZE
- FIND word: If word is found in hash table, do and print nothing. If word is NOT found, print WORD NUM_COLLISIONS
- SPELLCHECK filename: Find every word in the file filename and store all NOT found words and the associated number of collisions in the badwords STL map. Then print size of badwords on its own line and iterate through badwords, printing every WORD NUM_COLLISIONS pair
2. Programming tasks
All of the file reading and command parsing described above is handled for you.
The Chain_String_HT constructor and print() functions are written for you. You will have to write the constructor for Probe_String_HT, but print() is optional -- you might find it useful for debugging.
These functions should be written by you in both Chain_String_HT and Probe_String_HT and will be directly tested. The Chain version and the Probe version of each function will be worth half of the points below.
- [1 points] void insert(string)
- [1.5 points] void remove(string)
- [1 point] bool find(string, int &) (the second argument is num_collisions, allowing you to report back how many iterations collision resolution required)
- [1.5 points] expand_and_rehash()
You may work with a (human) partner on this lab. AI is allowed for the expand_and_rehash() function only
3. Submission
Submit 2 files to Gradescope: (1) your README and (2) your modified main.cpp. The README should contain your name, complete declarations of AI use, notes on any limitations or issues with your code, and your interpretation of any ambiguities in the assignment instructions. main.cpp should also contain your name and per-function comments on AI usage.