Multiple choice question for engineering
1. Which of the following is a wrong statement?
a) To assume biological activity, many nascent polypeptides have to be covalently modified before or after the folding process
b) In eukaryotic cells most modifications take place in the endoplasmic reticulum and the Golgi apparatus
c) The modifications in eukaryotic cells include proteolytic cleavage; formation of disulfide bonds; addition of phosphoryl, methyl, acetyl, or other groups onto certain amino acid residues
d) The modifications in eukaryotic cells doesn’t include attachment of oligosaccharides or prosthetic groups to create mature proteins
Answer: d [Reason:] Posttranslational modifications have a great impact on protein function by altering the size, hydrophobicity and overall conformation of the proteins. The modifications can directly influence protein–protein interactions and distribution of proteins to different subcellular locations.
2. Which of the following is a wrong about AutoMotif?
a) It is a web server predicting protein sequence motifs
b) It doesn’t use SVM approach
c) In this process, the query sequence is chopped up into a number of overlapping fragments
d) The overlapping fragments from are query sequence are fed into different kernels (similar to nodes)
Answer: b [Reason:] Hyperplane, which has been trained to recognize known protein sequence motifs, separates the kernels into different classes. Each separation is compared with known motif classes, most of which are related to posttranslational modification. The best match with a known class defines the functional motif.
3. It is important to use bioinformatics tools to predict sites for posttranslational modifications based on specific protein sequences. However, prediction of such modifications can often be difficult because the short lengths of the sequence motifs associated with certain modifications.
Answer: a [Reason:] This often leads to many false-positive identifications. One such example is the known consensus motif for protein phosphorylation, [ST]-x-[RK]. Such a short motif can be found multiple times in almost every protein sequence. Most of the predictions based on this sequence motif alone are likely to be wrong, producing very high rates of false-positives.
4. To minimize false-positive results, a statistical learning process called support vector machine (SVM) can be used to increase the specificity of prediction.
Answer: a [Reason:] This is a data classification method similar to the linear or quadratic discriminant analysis. In this method, the data are projected in a three-dimensional space or even a multidimensional space.
5. In a statistical learning process called support vector machine (SVM), a hyperplane is _____
a) a linear or nonlinear mathematical function
b) nonlinear mathematical function
c) linear mathematical function
d) exponential mathematical function
Answer: a [Reason:] It is used to best separate true signals from noise. The algorithm has more environmental variables included that may be required for the enzyme modification. After training the algorithm with sufficient structural features, it is able to correctly recognize many posttranslational modification patterns.
6. A disulfide bridge is a unique type of _____ modification in which _____ bonds are formed between cysteine residues.
a) posttranslational, covalent
b) translational, covalent
c) translational, ionic
d) posttranslational, ionic
Answer: a [Reason:] Disulfide bonds are important for maintaining the stability of certain types of proteins. The disulfide prediction is the prediction of paring potential or bonding states of cysteines in a protein.
7. Accurate prediction of _____ bonds may also help to predict the _____-dimensional structure of the protein of interest.
a) nitrogen, two
b) nitrogen, three
c) disulfide, three
d) oxygen, three
Answer: c [Reason:] This problem can be tackled by using profiles constructed from multiple sequence alignment. It can also be tackled by using residue contact potentials calculated based on the local sequence environment.
8. Only Advanced neural networks are used to discern long-distance pairwise interactions among cysteine residues.
Answer: b [Reason:] Advanced neural networks or SVM or hidden Markov model (HMM) algorithms are often used to discern long-distance pairwise interactions among cysteine residues. Cysteine is one of the publicly available programs specialized in disulfide prediction.
9. Cysteine doesn’t make predictions by building profiles.
Answer: b [Reason:] Is a web server that predicts the disulfide bonding states of cysteine residues in a protein sequence by building profiles based on multiple sequence alignment information. A recursive neural network ranks the candidate residues for disulfide formation.
10. ExPASY contains a number of programs to determine posttranslational modifications based on MS molecular mass data.
Answer: a [Reason:] Find Mod is a subprogram that uses experimentally determined peptide fingerprint information to compare the masses of the peptide fragments with those of theoretical peptides. If a difference is found, it predicts a particular type of modification basedona set of predefined rules. It can predict twenty-eight types of modifications, including methylation, phosphorylation, lipidation, and sulfation.
1. Ab initio type of algorithm predicts prokaryotic and eukaryotic promoters and regulatory elements based on characteristic sequences patterns for promoters and regulatory elements.
Answer: a [Reason:] Some ab initio programs are signal based, relying on characteristic promoter sequences such as the TATA box. Other programs rely on content information such as hexamer frequencies.
2. The advantage of the ab initio method is that the sequence can be applied as such without having to obtain experimental information.
Answer: a [Reason:] The limitation is the need for training, which makes the prediction programs species specific. In addition, this type of method has a difficulty in discovering new, unknown motifs.
3. Which of the following is incorrect regarding the ab initio approaches?
a) The conventional approach to detecting a promoter or regulatory site is through matching a consensus sequence pattern represented by regular expressions
b) The conventional approach to detecting a promoter or regulatory site is through matching a position-specific scoring matrix constructed from well-characterized binding sites
c) The consensus sequences or the matrices are relatively short, covering 6 to 10 bases
d) The consensus sequences or the matrices are relatively large, covering 700 to 1000 bases
Answer: d [Reason:] To determine whether a query sequence matches a weight matrix, the sequence is scanned through the matrix. Scores of matches and mismatches at all matrix positions are summed up to give a log odds score, which is then evaluated for statistical significance. This simple approach, however, often has difficulty differentiating true promoters from random sequence matches and generates high rates of false positives as a result.
4. To improve the specificity of prediction, some algorithms selectively ______ coding regions and focus on the upstream regions________, which are most likely to contain promoters. In that sense, promoter prediction and gene prediction are coupled.
a) include, (0.5 to 2.0 kb) only
b) include, (0.5 to 2.0 Mb) only
c) exclude, (0.5 to 2.0 Mb) only
d) exclude, (0.5 to 2.0 kb) only
Answer: d [Reason:] To better discriminate true motifs from background noise, a new generation of algorithms has been developed that take into account the higher order correlation of multiple subtle features by using discriminant functions, neural networks, or hidden Markov models (HMMs) that are capable of incorporating more neighboring sequence information.
5. Operon prediction is less important in prokaryotic promoter prediction.
Answer: b [Reason:] One of the unique aspects in prokaryotic promoter prediction is the determination of operon structures, because genes within an operon share a common promoter located upstream of the first gene of the operon. Hence, operon prediction is the key in prokaryotic promoter prediction.
6. Once an operon structure is known, ______ for the presence of a promoter and regulatory elements, _____ in the operon do not possess such DNA elements.
a) only the first gene is predicted, whereas other genes
b) only the first hundred genes are predicted, whereas next few genes
c) only first two genes are predicted, whereas next few genes
d) only first ten genes are predicted, whereas next few genes
Answer: a [Reason:] Only the first gene is predicted for the presence of a promoter and regulatory elements, whereas other genes in the operon do not possess such DNA elements. There are a number of methods available for prokaryotic operon prediction. The most accurate is a set of simple rules developed.
7. Which of the following is correct regarding the method for prokaryotic operon prediction?
a) It relies on two kinds of information: gene orientation and intergenic distances of a pair of genes of interest and conserved linkage of the genes based on comparative genomic analysis
b) It relies only on the gene orientation and intergenic distances of a pair of genes of interest
c) It relies only on the conserved linkage of the genes based on comparative genomic analysis
d) The prediction cannot be done manually using the rules
Answer: a [Reason:] A scoring scheme is developed to assign operons with different levels of Confidence. This method is claimed to produce accurate identification of an operon structure, which in turn facilitates the promoter prediction. The prediction can be done manually using the rules. The few dedicated programs for prokaryotic promoter prediction do not apply the rule for historical reasons. The most frequently used program is BPROM.
8. Which of the following is incorrect regarding BPROM?
a) It is a web-based program for prediction of bacterial promoters
b) It is a web-based program only for prediction of eukarotic promoters
c) It uses a linear discriminant function
d) The linear discriminant function is combined with signal and content information
Answer: b [Reason:] The linear discriminant function is combined with signal and content Information such as consensus promoter sequence and oligonucleotide composition of the promoter sites. This program first predicts a given sequence for bacterial operon structures by using an intergenic distance of 100 bp as basis for distinguishing genes to be in an operon.
9. In BPROM, once the operons are assigned, the program is able to predict putative promoter sequences.
Answer: a [Reason:] The most bacterial promoters are located within 200 bp of the protein coding region. Hence, the program is most effectively used when about 200 bp of upstream sequence of the first gene of an operon is supplied as input to increase specificity.
10. Which of the following is incorrect regarding FindTerm?
a) It is a program for searching bacterial ρ-independent termination signals located at the end of operons
b) It is a program for searching bacterial ρ-dependent termination signals located within the operons
c) The predictions are made based on matching of known profiles of the termination signals combined with energy calculations
d) It is available from the same site as FGENES and BPROM.
Answer: b [Reason:] The predictions are made based on matching of known profiles of the termination signals combined with energy calculations for the derived RNA secondary structures for the putative hairpin-loop structure. The sequence region that scores best in features and energy terms is chosen as the prediction. The information can sometimes be useful in defining an operon.
11. Which of the following is incorrect regarding the Prediction for Eukaryotes?
a) The consensus patterns are only derived from bioinformatics studies
b) The experimentally determined DNA binding sites are compiled into profiles and stored in a database for scanning an unknown sequence to find similar conserved patterns
c) The consensus patterns are derived from experimentally determined DNA binding sites
d) The ab initio method for predicting eukaryotic promoters and regulatory elements relies on searching the input sequences for matching of consensus patterns of known promoters and regulatory elements
Answer: a [Reason:] This approach tends to generate very high rate of false positives owing to nonspecific matches with the short sequence patterns. Furthermore, because of the high variability of transcription factor binding sites, the simple sequence matching often misses true promoter sites, creating false negatives.
12. To increase the specificity of prediction, a unique feature of eukaryotic promoter is employed, which is the presence of CpG islands.
Answer: a [Reason:] It is known that many vertebrate genes are characterized by a high density of CG dinucleotides near the promoter region overlapping the transcription start site. By identifying the CpG islands, promoters can be traced on the immediate upstream region from the islands.
By combining CpG islands and other promoter signals, the accuracy of prediction can be improved. Several programs have been developed based on the combined features to predict the transcription start sites in particular.
1. In bacteria, transcription is initiated by DNA polymerase.
Answer: b [Reason:] In bacteria, transcription is initiated by RNA polymerase, which is a multi-subunit enzyme. The σ subunit (e.g., σ70) of the RNA polymerase is the protein that recognizes specific sequences upstream of a gene and allows the rest of the enzyme complex to bind.
2. The upstream sequence where the σ protein binds constitutes the promoter sequence.
Answer: a [Reason:] This includes the sequence segments located 35 and 10 base pairs (bp) upstream from the transcription start site. They are also referred to as the −35 and −10 boxes. For the σ70 subunit in Escherichia coli, for example, the −35 box has a consensus sequence of TTGACA. The –10 box has a consensus of TATAAT.
3. The promoter sequence may determine the expression of one gene or a number of linked genes downstream.
Answer: a [Reason:] In the latter case, the linked genes form an operon. It is controlled by the promoter.
4. In addition to the RNA polymerase, there are also a number of DNA-binding proteins that facilitate the process of transcription.
Answer: a [Reason:] These proteins are called transcription factors. They bind to specific DNA sequences to either enhance or inhibit the function of the RNA polymerase.
5. The specific DNA sequences to which the transcription factors bind are referred to as _____
a) replication elements
b) blocking factors
c) transcription factors
d) regulatory elements
Answer: d [Reason:] The regulatory elements may bind in the vicinity of the promoter or bind to a site several hundred bases away from the promoter. The reason that the regulatory proteins binding at long distance can still exert their effect is because of the flexible structure of DNA, which is able to bend and exert its effect by bringing the transcription factors in close contact with the RNA polymerase complex.
6. In eukaryotes, gene expression is not regulated by a protein complex formed between transcription factors and RNA polymerase.
Answer: b [Reason:] Here, the gene expression is also regulated by a protein complex formed between transcription factors and RNA polymerase. However, eukaryotic transcription has an added layer of complexity in that there are three different types of RNA polymerase complexes, namely RNA polymerases I, II, and III.
7. Which of the following is untrue?
a) RNA polymerases I is responsible for the transcription of ribosomal RNA
b) RNA polymerases III is responsible for the transcription of tRNA
c) RNA polymerase II is exclusively responsible for transcribing protein-encoding genes
d) Synthesis of mRNAs is carried out by RNA polymerase I
Answer: d [Reason:] Ach polymerase transcribes different sets of genes. RNA polymerase II is exclusively responsible for transcribing protein-encoding genes or synthesis of mRNAs.
8. In eukaryotes, genes often form an operon with a shared promoter.
Answer: a [Reason:] Unlike in prokaryotes, where genes often form an operon with a shared promoter, each eukaryotic gene has its own promoter. The eukaryotic transcription machinery also requires many more transcription factors than its prokaryotic counterpart to help initiate transcription.
9. Eukaryotic RNA polymerase II does not directly bind to the promoter, but relies on a dozen or more transcription factors to recognize and bind to the promoter in a specific order before its own binding around the promoter.
Answer: a [Reason:] The core of many eukaryotic promoters is a so-called TATA box, located 30 bps upstream from the transcription start site, having a consensus motif TATA (A/T) A (A/T). However, not all eukaryotic promoters contain the TATA box. Many genes such as housekeeping genes do not have the TATA box in their promoters.
10. The TATA box is often used as an indicator of the presence of a promoter.
Answer: a [Reason:] In addition, many genes have a unique initiator sequence (Inr), which is a pyrimidine rich sequence with a consensus (C/T)(C/T)CA(C/T)(C/T). This site coincides with the transcription start site. Most of the transcription factor binding sites are located within 500 bp upstream of the transcription start site.
1. Which of the following is untrue regarding the classic yeast two-hybrid method?
a) It is used for the detection of Protein interactions
b) Method that relies on the interaction of “bait” and “prey” proteins in molecular constructs in yeast
c) DNA-binding domain and a trans-activation domain don’t necessarily interact
d) In this strategy, a two-domain transcriptional activator is employed as a helper for determining protein–protein interactions
Answer: c [Reason:] The two domains which are a DNA-binding domain and a trans-activation domain normally interact to activate transcription. However, molecular constructs are made such that each of the two domains is covalently attached to each of the two candidate proteins (bait and prey).
2. If the bait and prey proteins _______ they bring the DNA-binding and trans-activation domains in such close proximity that they reconstitute the function of the transcription activator, turning ____ the expression of a reporter gene as a result.
Which of the following is not the correct pair of blanks?
a) physically interact, on
b) do not interact, on
c) do not interact, off
d) stop interacting, off
Answer: b [Reason:] molecular constructs are made such that each of the two domains is covalently attached to each of the two candidate proteins. If the two candidate proteins do not interact, the reporter gene expression remains switched off.
3. Which of the following is untrue regarding the classic yeast two-hybrid method?
a) Protein–protein interaction networks of yeast and a small number of other species have been subsequently determined using this method
b) This technique is a high throughput approach
c) Each bait and prey construct has to be prepared individually to map interactions between all proteins
d) It has been systematically applied to study interactions at the whole proteome level
Answer: b [Reason:] This technique is essentially a low throughput approach. A major flaw in this method is that it is an indirect approach to probe protein–protein interaction and has a tendency to generate false positives (spurious interactions) and false negatives (undetected interactions). It has been estimated from proteome-wide characterizations that the rate of false positives can be as high as 50%.
4. An alternative approach to determining protein–protein interactions is to use a large-scale affinity purification technique that involves attaching fusion tags to proteins and purifying the associated protein complexes in an affinity chromatography column.
Answer: a [Reason:] The purified proteins are then analyzed by gel electrophoresis followed by MS for identification of the interacting components.
The protein microarray systems mentioned above also provide a high throughput alternative for studying protein–protein interactions.
5. Which of the following is untrue regarding the
Predicting Interactions Based on Domain Fusion
a) It is based on gene fusion events
b) Predicting protein–protein interactions is called the “Rosetta stone” method
c) A fused protein often reveals relationships between its domain components
d) A fused protein doesn’t necessarily reveal about the relationships between its domain components
Answer: d [Reason:] The rationale goes like this: if A and B exist as interacting domains in a fusion protein in one proteome, the gene encoding the protein is a fusion gene. Their homologous gene sequences A and B existing separately in another genome most likely encode proteins interacting to perform a common function. Conversely, if ancestral genes A and B encode interacting proteins, they may have a tendency to be fused together in other genomes during evolution to enhance their effectiveness.
6. When the two domains are located in two different proteins, to preserve the same functionality, their close proximity and interaction have to be preserved as well.
Answer: a [Reason:] In this method, by studying gene/protein fusion events, protein–protein interactions can be predicted. This prediction rule has been proven to be rather reliable and since successfully applied to a large number of proteins from both prokaryote and eukaryotes.
7. The justification behind Rosetta stone method is that when two domains are fused in a single protein, they have to be in _______ proximity to perform a common function.
c) extremely distant
d) extremely close
Answer: d [Reason:] When the two domains are located in two different proteins, to preserve the same functionality, their close proximity and interaction have to be preserved as well. Therefore, by studying gene/protein fusion events, protein–protein interactions can be predicted.
8. In Predicting Interactions Based on Gene Neighbors– if a certain gene linkage is found to be indeed conserved across divergent genomes, it can be used as a strong indicator of formation of an operon that encodes proteins that are functionally and even physically coupled.
Answer: a [Reason:] This rule of predicting protein–protein interactions holds up for most prokaryotic genomes. For eukaryotic genomes, gene order may be a less potent predictor of protein interactions than a tight co-regulation for gene expression.
9. Which of the following is untrue regarding the predicting Interactions Based on Phylogenetic Information?
a) Proteins do not operate as a complex
b) This method detects the co-presence or co-absence of orthologs across a number of genomes
c) Protein interactions can be predicted using phylogenetic profiles
d) Phylogenetic profile are defined as patterns of gene pairs that are concurrently present or absent across genomes
Answer: a [Reason:] the logic behind the co-occurrence approach is that proteins normally operate as a complex. If one of the components of the complex is lost, it results in the failure of the entire complex. Under the selective pressure, the rest of the nonfunctional interacting partners in the complex are also lost during evolution because they have become functionally unnecessary.
10. Which of the following is untrue regarding the STRING?
a) Search Tool for the Retrieval of Interacting Genes/Proteins
b) Functional associations include only the direct protein-protein interactions
c) It is based on combined evidence of gene linkage, gene fusion and phylogenetic profiles
d) It is a web server that predicts gene and protein functional associations
Answer: b [Reason:] Functional associations include both direct and indirect protein-protein interactions.
Indirect interactions can mean enzymes in the same pathway sharing a common substrate or proteins regulating each other in the genetic pathway.
1. The physical contacts between domains are crucial for the functioning of the cellular machinery.
Answer: a [Reason:] Interactions between domains occur in multidomain proteins, in stable complexes and in transient interactions between proteins that also exist independently. Experimental approaches for the large-scale determination of protein interactions are emerging. Theoretical analyses based on protein structures have unraveled some of the overall principles and features of the way domains evolved to interact with each other.
2. There exist three types of interactions between domains. Which of the following is not one of them?
a) Stable complex
b) Transient interaction
c) Multi-domain protein
d) Unstable interaction
Answer: d [Reason:] Interactions between domains determine the structure of multidomain proteins, in which there are several domains on one polypeptide chain. Given that all proteins consist of domains, interactions between domains also occur between the proteins that are permanently associated in stable complexes and proteins that interact transiently, but also exist independently of each other.
3. Stable complexes consist of proteins that are _____ associated with each other, like many ____ proteins for instance.
a) temporarily, oligomeric
b) temporarily, monomeric
c) permanently, oligomeric
d) permanently, monomeric
Answer: c [Reason:] Well-known stable complexes include the histone octamer, the ribosome and DNA and RNA polymerases. Transient interactions on the other hand are all those protein-protein interactions that occur between proteins that also exist independently.
4. Sets of proteins that are part of stable complexes and sets of proteins involved in transient interactions ____ in terms of the similarity in gene expression among the set of proteins.
a) are similar
c) are same
d) show similar function
Answer: b [Reason:] Proteins permanently associated in a stable complex need to be present or absent in the cell at the same time. Analysis of microarray data by Gerstein and co-workers by methods along the lines, has shown that the members of stable complexes in the yeast Saccharomyces cerevisiae have highly correlated gene expression patterns.
5. Correlation of gene expression for pairs of transiently interacting proteins is ______ compared to randomly chosen pairs of proteins.
a) not significant
b) only marginally significant
c) totally significant
d) significant to much extent
Answer: b [Reason:] In prokaryotes, genes are co-regulated if they are a member of the same operon, and many proteins that are members of the same stable complex are part of the same operon. For instance, Ouzounis and Karp determined that over 90% of the enzymes that are in stable complexes in E. coli metabolic pathways are adjacent on the E. coli chromosome.
6. Membership in a stable complex also differs from transient interaction in terms of evolutionary constraints upon sequence divergence.
Answer: a [Reason:] Thus the proteins in stable complexes are more similar across species, having higher sequence identity between orthologs, than the proteins in transient interactions. A calculation by Teichmann showed that there are significant differences between the average values for sequence identities between S. cerevisiae and S. pombe orthologs in stable complexes, transient interactions and monomers.
7. For proteins in stable complexes the average sequence identity is 46%, while for proteins in transient interactions it is 41%.
Answer: a [Reason:] (Proteins not known to be involved in any type of interaction have an average sequence identity of 38 %) One of the main reasons for this is the surface area involved in interfaces of stable complexes which is larger than in transient complexes. Sequence divergence may be slower in order to conserve these extensive interfaces.
8. Which of the following is incorrect about Yeast-two-hybrid screens?
a) The yeast-two-hybrid system uses the transcription of a reporter gene driven by the Gal4 transcription factor to monitor whether or not two proteins are interacting
b) The DNA-binding domain chimeric protein will not bind upstream of the reporter gene
c) If the activation domain chimeric protein interacts with the DNA-binding domain chimeric protein, the reporter gene will be transcribed
d) Disadvantages of the method are that only pairwise interactions are tested, and not interactions that can only take place when multiple proteins come together, as well as a high false positive rate
Answer: b [Reason:] If the interaction between two proteins, A and B, is being tested, one of their genes would be fused to the DNA-binding domain of the Gal4 transcription factor (Gal4-DBD) while the other would be fused to the activation domain (Gal4-AD). The DNA-binding domain chimeric protein will bind upstream of the reporter gene. This experiment can be carried out hundreds or even thousands of times on microassay plates, as in the case of the study by Uetz and colleagues on S.cerevisiae (yeast) interactions. Each array element on these plates contains yeast cells transformed with a particular combination of two plasmids, one carrying the DNA-binding domain chimeric protein and the other the activation domain chimeric protein.
9. Which of the following is incorrect about Purification of protein complexes followed by mass spectrometry?
a) Isolating protein complexes from cells allows identification of interactions between ensembles of proteins instead of just pairs
b) Systematic purification of complexes on a large scale is done by tagging hundreds of genes with an epitope
c) UnLike in the yeast-two-hybrid assay, this does not involve chimeric genes
d) Affinity purification based on the epitope will then extract all the proteins attached to the bait protein from cell lysates
Answer: c [Reason:] Like in the yeast-two-hybrid assay, this is done by making chimeric genes that are introduced into cells. The principle of mass spectrometric identification of proteins is that the protein is chopped into fragments by tryptic digestion, and the mass of each fragment is measured by matrix-assisted laser desorption/ ionization-time-of-flight mass spectrometry (MALDI-TOF MS). This measurement is so accurate that the combination of amino acids in each fragment can be calculated and compared to a database of all the proteins in the proteome of the organism in order to find the correct one.