Write a program to read protein sequences from a file, count them, and allow for retrieval of any single protein sequence. Read in proteins and store them in a hash table. You do not know what the proteins are ahead of time (pretend that the input dataset may change). So you will have to resolve collisions. The input file is very large, but somehow you happen to know that each protein will be less than 30 amino acids long so you can store them in a 30 character string. You also know that the file contains many copies of less than 20 unique proteins, so, you can use a data array with 40 elements which is twice as much space as you need, to reduce the number of collisions. Each element will contain the key value itself (the protein), and the number of times it occurs in the input file (the count). Use the following data structure:
struct arrayelement {
char protein[30];
int count;
};
arrayelement proteins[40];
The hash function is:
h(key) = ( first_letter_of_key + (2 * last_letter_of_key) ) % 40 where, A = 0, B = 1, …, Z = 25.
Generate output of the form:
Protein Count
BIKFPLVHANQHVDNSVRWGIKDW 5929
AWGKKKTKTQFQFPTADANCDCDD 7865
Etc for all of them…
Please enter a sequence: AWGKKKTKTQFQFPTADANCDCDD 7865 FOUND
Please enter a sequence: LADYGAGABORNTHISWAY NOT FOUND
// The file processing algorithm
While(there are proteins)
Read in a protein
Hash the initial index into the proteins table
While(forever)
If(found key in table)
Increment count
Break;
If(found empty spot in table)
Copy key into table
Increment count
Break;
Increment index; // collision! Try the next spot! T
his is the link to the protein.txt file http://wserver.flc.losrios.edu/~ross/CISP430S16.SPLWQKFGXWKTGZHS.BOB/proteins.txt This code should be done in c or c++ Please include an output for this code
Part 2
Write a program to read key words from a file, count them by inserting them into a PERFECT hash table, and allows for retrieval of any word. Use the input file keywords.txt.
Use perfect hashing – this means that you will first need to generate perfect hashing key tables. The input file contains duplicate values, so to create the perfect-hash-lookup-tables, you will first need to "manually" remove duplicates – MS Access can do this easily, MS Excel can do this slightly less easily, or you can do it the hard way by writing a program. Once you have isolated the unique keys, you will need to create perfect-hash-lookup-tables using some combination of manual and automated methods as you see fit.
Run your final program on the ORIGINAL input file keywords.txt.
Please use C++ or C