Multiple choice question for engineering
1. In ab initio methods for RNA structure prediction, the prediction results from RNAfold are not always guaranteed to be better than those predicted by Mfold.
Answer: a [Reason:] Because of the much larger number of secondary structures to be computed, a more simplified energy rule has to be used to increase computational speed. Thus, the prediction results are not always guaranteed to be better than those predicted by Mfold.
2. The comparative approach uses basic RNA structure based predictions to infer a consensus structure.
Answer: b [Reason:] The comparative approach uses multiple evolutionarily related RNA sequences to infer a consensus structure. This approach is based on the assumption that RNA sequences that deem to be homologous fold into the same secondary structure.
3. To distinguish the conserved secondary structure among multiple related RNA sequences, a concept of “covariation” is used.
Answer: a [Reason:] It is known that RNA functional motifs are structurally conserved. To maintain the secondary structures while the homologous sequences evolve, a mutation occurring in one position that is responsible for base pairing should be compensated for by a mutation in the corresponding base-pairing position so to maintain base pairing and the stability of the secondary structure.
4. ______ of covariation can be ______ to the RNA structure and functions
a) Any lack, deleterious
b) Any lack, benign
c) Any abundance, deleterious
d) Any inadequacy, advantageous
Answer: a [Reason:] Based on this rule, algorithms can be written to search for the covariation patterns after a set of homologous RNA sequences are properly aligned. The detected correlated substitutions help to determine conserved base pairing in a secondary structure.
5. An aspect of the comparative method is to select a _____ structure through consensus drawing.
a) relatively distinct
d) least abundant
Answer: c [Reason:] predicting secondary structures for each individual sequence may produce errors, by comparing all predicted structures of a group of aligned RNA sequences and drawing a consensus. Hence, the commonly adopted structure can be selected; many other possible structures can be eliminated in the process.
6. The comparative-based algorithms can be further divided into two categories based on the type of input data.
Answer: a [Reason:] The comparative-based algorithms can be further divided as mentioned. One requires predefined alignment and the other does not.
7. The type of algorithm that requires predefined alignment, requires the user to provide _______ alignment as input.
a) not necessarily an alignment
b) multiple only
c) pairwise or multiple
d) pair wise only
Answer: c [Reason:] As the name suggests, it does require predefined alignment, the option ‘a’ becomes irrelevant. The sequence alignment can be obtained using standard alignment programs such as T-Coffee, PRRN, or Clustal. Based on the alignment input, the prediction programs compute structurally consistent mutational patterns such as covariation and derive a consensus structure common for all the sequences.
8. The type of algorithm that _____ predefined alignment is ______ for reasonably conserved sequences.
a) doesn’t require, more successful
b) requires, less successful
c) doesn’t require, relatively successful
d) requires, relatively successful
Answer: d [Reason:] The requirement for using this type of program is an appropriate set of homologous sequences that have to be similar enough to allow accurate alignment, but divergent enough to allow covariations to be detected. If this condition is not met, correct structures cannot be inferred.
9. The predefined alignment requiring method also depends on the quality of the input alignment. If there ___ errors in the alignment, covariation signals ____ detected.
a) are, will be
b) are, will not be
c) are not, will not be
d) are not, possibly will not be
Answer: b [Reason:] The selection of one single consensus structure is also a drawback because alternative and evolutionarily unconserved structures are not predicted. The RNAalifold is an example of this type of program based on predefined aligned sequences.
10. Which of the following is true about the RNAalifold?
a) Dynamic programming is not involved
b) Minimum free energy method is not used
c) Only minimum free energy is used
d) Covariation information is taken into consideration
Answer: d [Reason:] It is a program in the Vienna package. It uses a multiple sequence alignment as input to analyze covariation patterns on the sequences. A scoring matrix is created that combines minimum free energy and covariation information. Dynamic programming is used to select the structure that has the minimum energy for the whole set of aligned RNA sequences.
1. Which of the following is untrue about comparative genomics?
a) It is comparison of whole genomes from different organisms
b) It includes comparison of gene number, gene location, and gene content from these genomes
c) It provides insights into the mechanism of genome evolution and gene transfer among genomes
d) It doesn’t help to reveal the extent of conservation among genomes
Answer: c [Reason:] It helps to understand the pattern of acquisition of foreign genes through lateral gene transfer. It also helps to reveal the core set of genes common among different genomes, which should correspond to the genes that are crucial for survival. This knowledge can be potentially useful in future metabolic pathway engineering.
2. Which of the following is untrue about Whole Genome Alignment?
a) This helps to reveal the presence of conserved functional elements
b) It doesn’t help to understand sequence conservation between genomes
c) It be accomplished through direct genome comparison or genome alignment
d) The alignment at the genome level is fundamentally no different from the basic sequence alignment
Answer: b [Reason:] Regular alignment programs tend to be error prone and inefficient when dealing with long stretches of DNA containing hundreds or thousands of genes. Another challenge of genome alignment is effective visualization of alignment results. Because it is obviously difficult to sift through and make sense of the extremely large alignments, a graphical representation is a must for interpretation of the result.
3. Which of the following is untrue about LAGAN?
a) It stands for Limited Area Global Alignment of Nucleotides
b) It is a web-based program designed for pairwise alignment of small fragments of genomes only
c) It first finds anchors between two genomic sequences using an algorithm that identifies short, exactly matching words
d) Regions that have high density of words are selected as anchors
Answer: b [Reason:] is a web-based program designed for pairwise alignment of large genomes. The unique feature of this program is that it is able to take into account degeneracy of the genetic codes and is therefore able to handle more distantly related genomes.
4. A minimal constitutes genome, which is a _____ set of genes required for maintaining a free living cellular organism.
c) highest number of set of
Answer: d [Reason:] Finding minimal genomes helps provide an understanding of genes constituting key metabolic pathways, which are critical for a cell’s survival. This analysis involves identification of orthologous genes shared between a number of divergent genomes.
5. Coregenes is a web-based program that determines a ____ set of genes based on comparison of ____ small genomes.
a) vast, four
b) core, fifteen
c) core, four
d) vast, fifteen
Answer: c [Reason:] The user supplies NCBI accession numbers for the genomes of interest. The program performs an iterative BLAST comparison to find orthologous genes by using one genome as a reference and another as a query. This pairwise comparison is performed for all four genomes. As a result, the common genes are compiled as a core set of genes from the genomes.
6. Which of the following is untrue about Lateral gene transfer?
a) It is also known as vertical gene transfer
b) There is exchange of genetic materials between species
c) It mainly occurs among prokaryotic organisms when foreign genes are acquired through mechanisms
d) It is one of the examples is transformation
Answer: a [Reason:] is defined as the exchange of genetic materials between species in a way that is incongruent with commonly accepted vertical evolutionary pathway. Examples are transformation (direct uptake of foreign DNA from environment), conjugation (gene uptake through mating behavior), and transduction (gene uptake mediated by infecting viruses). The transmission of genes between organisms can occur relatively recently or as a more ancient event.
7. A way to discern lateral gene transfer is through phylogenetic analysis, referred to as an
‘among-genome’ approach, which can be used to discover __________
a) recent lateral gene transfer events but almost negligible ancient events
b) recent lateral gene transfer events
c) ancient lateral gene transfer events
d) both recent and ancient lateral gene transfer events
Answer: d [Reason:] Abnormal groupings in phylogenetic trees are often interpreted as the possibility of lateral gene transfer events. There are some basic tools for identifying genomic regions that may be a result of lateral gene transfer events using the within-genome approach, namely, ACT, Swaap.
8. Within-Genome Approach is to identify regions within a genome with unusual compositions.
Answer: a [Reason:] Single or oligonucleotide statistics, such as G–C composition, codon bias, and oligonucleotide frequencies are used. Unusual nucleotide statistics in certain genomic regions versus the rest of the genome may help to identify “foreign” genes in a genome. A commonly used parameter is GC skew ((G − C)/(G + C)), which is compositional bias for G in a DNA sequence and is a commonly used indicator for newly acquired genetic elements.
9. Which of the following is untrue about Gene Order Comparison?
a) When the order of a number of linked genes is conserved between genomes, it is called synteny
b) Generally, gene order is much more conserved compared with gene sequences.
c) Generally, gene order is much less conserved compared with gene sequences.
d) It is in fact rarely observed among divergent species.
Answer: b [Reason:] Gene order conservation is in fact rarely observed among divergent species. Therefore, comparison of syntenic relationships is normally carried out between relatively close lineages. However, if syntenic relationships for certain genes are indeed observed among divergent prokaryotes, they often provide important clues to functional relationships of the genes of interest.
10. Genes involved in the same metabolic pathway tend to be clustered among phylogenetically diverse organisms.
Answer: a [Reason:] The preservation of the gene order is a result of the selective pressure to allow the genes to be co-regulated and function as an operon. Furthermore, the synteny of genes from divergent groups often associates with physical interactions of the encoded gene products.
1. Comparative genomics includes a comparison of gene number, gene content, and gene location in both prokaryotic and eukaryotic groups of organisms.
Answer: a [Reason:] The availability of complete genome sequences makes possible a comparison of all of the proteins encoded by one genome, the proteome of that organism, with those of another. Because the genome sequence provides both the sequence and the map location of each gene, both the sequence and location can be compared.
2. Which of the following information Sequence comparisons do not provide?
a) Gene relationships
b) Function history
c) Evolutionary history
d) Gene locations
Answer: d [Reason:] Map locations of orthologous genes may also be compared. If a set of genes is grouped together at a particular chromosomal location, and if a set of similar genes is also grouped together in the genome of another organism, these groups share an evolutionary history.
3. Which of the given statements is incorrect?
a) Proteins may be clustered into families on the basis of either sequence or structural similarity
b) Proteins often comprise separate domains
c) The number of protein sequences that are available is insufficient to determine that domain shuffling occurs in evolution
d) Proteins are modular
Answer: c [Reason:] The number of protein sequences is sufficient unlike mentioned in option c. The comparisons of proteomes of different organisms can identify the type of domain changes and also provide an indication as to what biological role they may have in a particular organism.
4. Which of the given statements is incorrect?
a) Two tandem copies of a gene are produced while Proteins with new functions are produced
b) Proteins with new functions are produced by a gene duplication event
c) Assortment and reassortment of protein domains takes place in individual genomes
d) In no case the two duplicated genes both undergo change
Answer: d [Reason:] In a possibility, two duplicated genes both undergo change, but interactions between the proteins stabilize the original function and support the evolution of new ones. Through mutation and natural selection, one of the copies can develop a new function, leaving the other copy to cover for the original function. However, because most mutations are deleterious to function, often one of the copies becomes a pseudogene. Not all gene duplications are thought to have the above effects.
5. Which of the given statements is incorrect?
a) The processes of domain assortment and gene duplication produce families of proteins in organisms
b) Following speciation, a newly derived genome will inherit the families of ancestor organisms, but will also develop new ones to meet evolutionary challenges
c) Comparison of each of the proteins encoded by an organism with every protein, an all-against-all comparison, reveals which protein families have been amplified and what rearrangements have occurred as steps in the evolutionary process
d) When two or more proteins in the proteome share a high degree of similarity they are least likely to be paralogs
Answer: d [Reason:] When two or more proteins in the proteome share a high degree of similarity because they share the same set of domains, they are likely to be paralogs, genes that arose by gene duplication events. Proteins that align over shorter regions share some domains, but also may not share others. Although gene duplication events could have created such variation, other rearrangements may have also occurred, blurring the evolutionary history.
6. Which of the given statements is incorrect about All-against-all Self-comparison?
a) A comparison of each protein in the proteome with all other proteins distinguishes unique proteins from proteins that have arisen from gene duplication, and also reveals the number of protein families but the domain content of these proteins cannot be known
b) In all-against-all proteome comparison, each protein is used as a query in a similarity search against the remaining proteome
c) In all-against-all proteome comparison, the similar sequences are ranked by the quality and length of the alignments found
d) In all-against-all proteome comparison, The search is conducted with each alignment score receiving a statistical evaluation (P or E value)
Answer: a [Reason:] The domain content of these proteins may also be analyzed. in all-against-all proteome comparison, a match between a query sequence and another proteome sequence with the same domain structure will produce a high-scoring, highly significant alignment. These proteins are designated paralogs because they have almost certainly originated from a gene duplication event.
7. Which of the given statements is incorrect about Cluster analysis?
a) Clustering organizes the proteins into groups by some objective criterion
b) One criterion for a matching protein pair is the statistical significance of their alignment score
c) The P or E value from BLAST searches cannot be the criterion for a matching protein pair
d) A criterion for clustering proteins is the distance between each pair of sequences in a multiple sequence alignment
Answer: c [Reason:] Option c and b mean the same yet are different by the negation in option c.The lower this value, the better the alignment. There will be a cutoff P or E value at which the matches in the BLAST search are no longer considered significant. A value of P or E = 0.01–0.05 is usually the point at which the alignment score is no longer considered to be significant in order to focus on a more closely related group of proteins.
8. Which of the given statements is incorrect about Clustering by making subgraphs?
a) Each sequence is a vertex and each pair of sequences that is matched with a significant alignment score is joined by an edge that is weighted according to the statistical significance of the alignment score
b) One way to identify the most strongly supported clusters is simply to add the most weakly supported edges in the graph
c) One way to identify the most strongly supported clusters is simply to remove the most weakly supported edges in the graph
d) An edge is weighted according to the statistical significance of the alignment score
Answer: c [Reason:] As weaker and weaker links are removed, the remaining combinations of vertices and edges represent most strongly linked sequences. This type of analysis was performed on an initial collection of E. coli genes by Labedan and Riley (1995).
9. Which of the given statements is incorrect about Clustering by single linkage?
a) In First step, a group of related sequences found in the all-against-all proteome comparison is subjected to a multiple sequence alignment usually by CLUSTALW
b) A neighbor-joining algorithm is rarely used in this method
c) This procedure and the algorithms are the same as those used to make a phylogenetic tree by the distance methods
d) A distance matrix that shows the number of amino acid changes between each pair of sequences is made
Answer: b [Reason:] The matrix is then used to cluster the sequences by a neighbor-joining algorithm. These methods produce a tree or a different representation of the tree called a dendrogram, which minimizes the number of amino acid changes that would generate the group of sequences.
10. The all-against-all analyses provide an indication as to the number of protein/gene families in an organism. This number represents the core proteome of the organism from which all biological functions have diversified.
Answer: a [Reason:] In Hemophilus, 1247 of the total number of 1709 proteins do not have paralogs.
The core proteomes of the worm and fly are similar in size but with a greater number of duplicated genes in the worm. It is quite remarkable that the core proteome of the multicellular organisms (worm and fly) is only twice that of yeast.
1. In FASTA, For a Z-score > 15, the match can be considered extremely______ with _____of a homologous relationship.
a) insignificant, uncertainty
b) significant, uncertainty
c) significant, certainty
d) insignificant, certainty
Answer: c [Reason:] If Z is in the range of 5 to15, the sequence pair can be described as highly probable homologs. If Z<5, their relationship is described as less certain.
2. BLAST uses a _______ to find matching words, whereas FASTA identifies identical matching words using the _____
a) substitution matrix, hashing procedure
b) substitution matrix, blocks
c) hashing procedure, substitution matrix
d) ktups, substitution matrix
Answer: a [Reason:] BLAST and FASTA have been shown to perform almost equally well in regular database searching; However, there are some notable differences between the two approaches.The major difference is in the seeding step– BLAST uses a substitution matrix to find matching words, whereas FASTA identifies identical matching words using the hashing procedure.
3. Which of the following is not a benefit or a factual of FASTA over BLAST?
a) FASTA scans smaller window sizes
b) It gives more sensitive results
c) It gives less sensitive results
d) It gives results with a better coverage rate for homologs
Answer: c [Reason:] By default, FASTA scans smaller window sizes. Thus, it gives more sensitive results than BLAST, with a better coverage rate for homologs. However, it is usually slower than BLAST.
4. The use of low-complexity masking in the BLAST procedure means that it may have higher specificity than FASTA because potential false positives are reduced.
Answer: a [Reason:] In addition to the given statement, BLAST sometimes gives multiple best-scoring alignments from the same sequence. FASTA returns only one final alignment.
5. Which of the following is not a benefit of BLAST?
a) Handling of gaps
c) More sensitive
d) Statistical rigor
Answer: a [Reason:] In addition to this, user friendly UI of BLAST is also one of its benefits. However, it does not handle gaps well. In that case gapped BLAST is better.
6. BLAST might not find matches for very short sequences.
Answer: a [Reason:] In BLAST, similarity matching of words is involved. If no words are found similar, then no alignment is detected and hence it might not find matches for very short sequences.
7. BLAST often produces several short HSPs rather than a single aligned region.
Answer: a [Reason:] The results of the word matching and attempts to extend the alignment are segments. They are called as HSPs (High-Scoring Segment Pairs). BLAST often produces several short HSPs rather than a single aligned region.
8. FASTA is derived from logic of the dot plot.
Answer: a [Reason:] Because of this, it computes best diagonals from all frames of alignment. The method looks for exact matches between words in query and test sequence.
9. The gapped portion in the diagonals represents matches in FASTA.
Answer: b [Reason:] The diagonal’s nature indicates the matching of the sequences. After all diagonals are found, it tries to join diagonals by adding gaps. Further, it Computes alignments in regions of best diagonals.
10. The initiation of FASTA format has ____ symbol.
Answer: a [Reason:] Its format is simple as used by almost all programs. Header line has > at the beginning. Also no specific requirements are there for line length, characters, etc.
1. Which of the following is untrue about SAGE?
a) This approach is much more efficient than the EST analysis
b) This approach is quite less efficient than the EST analysis
c) It uses a short nucleotide tag to define a gene transcript
d) It allows sequencing of multiple tags in a single clone
Answer: b [Reason:] If an average clone has a size of 700 bp, it can contain up to 50 sequence tags (15 bp each), which means that the SAGE method can be at least fifty times more efficient than the brute force EST sequencing and counting. Therefore, the SAGE analysis has a better chance of detecting weakly expressed genes.
2. Which of the following is untrue about SAGE?
a) Sequencing is the most costly and time-consuming step
b) Here, sequencing is economical but time-consuming step
c) Sequencing is economical but time-reducing step
d) It is difficult to know how many tags need to be sequenced to get a good coverage of the entire transcriptome
Answer: a [Reason:] It is generally determined on a case-by-case basis. As a rule of thumb, 10,000 clones representing approximately 500,000 tags from each sample are sequenced. The scale and cost of the sequencing required for SAGE analysis are prohibitive for most laboratories. Only large sequencing centers can afford to carry out SAGE analysis routinely.
3. Which of the following is untrue about the drawbacks of SAGE?
a) One or two sequencing errors in the tag sequence can lead to ambiguous or erroneous tag identification
b) Correctly sequenced SAGE tag sometimes may correspond to several genes or no gene at all
c) Correctly sequenced SAGE tag always corresponds to several genes
d) The drawback with this approach is the sensitivity to sequencing errors
Answer: c [Reason:] To improve the sensitivity and specificity of SAGE detection, the lengths of the tags need to be increased for the technique. There are some comprehensive software tools for SAGE analysis viz. SAGEmap, SAGExProfiler.
4. SAGEmap is a SAGE database created by NCBI.
Answer: a [Reason:] Given a cDNA sequence, one can search SAGE libraries for possible SAGE tags and perform “virtual” Northern blots that indicate the relative abundance of a tag in a SAGE library. Each output is hyperlinked to a particular UniGene entry with sequence annotation.
5. SAGExProfiler doesn’t provide information about overexpressed or silenced genes
Answer: b [Reason:] It is a web-based program that allows a “virtual subtraction” of an expression profile of one library (e.g., normal tissue) from another (e.g., diseased tissue). Comparison of the two libraries can provide information about overexpressed or silenced genes in normal versus diseased tissues.
6. Which of the following is untrue about SAGE Genie?
a) It is an NCBI web-based program
b) It allows matching of experimentally obtained SAGE tags to known genes
c) It provides an interface for visualizing human gene expression
d) It doesn’t filter out linker sequences from experimentally obtained SAGE tags
Answer: d [Reason:] It has a filtering function that filters out linker sequences from experimentally obtained SAGE tags and allows expression pattern comparison between normal and diseased human tissues. The data output can be presented using subprograms such as the Anatomic Viewer, Digital Northern, and Digital Gene Expression Display.
7. Which of the following is an incorrect statement?
a) SAGE and DNA microarrays are both high throughput techniques that determine global mRNA expression levels
b) Studies have indicated that the gene expression measurements from these methods are highly inconsistent with each other
c) SAGE does not require prior knowledge of the transcript sequence
d) DNA microarray experiments can only detect the genes spotted on the microarray
Answer: b [Reason:] SAGE has the potential to allow discovery of new, yet unknown gene transcripts. Because is able to measure all the mRNA expressed in a sample, it becomes possible.
8. DNA microarrays measure “absolute” mRNA expression levels without arbitrary reference standards, whereas SAGE indicates the relative expression levels.
Answer: b [Reason:] SAGE measures “absolute” mRNA expression levels without arbitrary reference standards, whereas DNA microarrays indicate the relative expression levels. Therefore, SAGE expression data are more comparable across experimental conditions and platforms. This makes public SAGE databases more informative by allowing comparison of data from reference conditions with various experimental treatments.
9. The PCR amplification step involved in the SAGE procedure means that it requires a large quantity of sample mRNA.
Answer: a [Reason:] The PCR amplification step involved in the SAGE procedure means that it requires only a minute quantity of sample mRNA. This compares favorably to the requirement for a much larger quantity of mRNA for microarray experiments, which may be impossible to obtain under certain circumstances.
10. Which of the following is an incorrect statement?
a) Collecting a SAGE library is very labor intensive and expensive
b) Collecting a SAGE library is quite economical
c) SAGE is not suitable for rapid screening of cells
d) Gene identification from SAGE data is also more cumbersome
Answer: b [Reason:] The Gene identification from SAGE data is also more cumbersome because the mRNA tags have to be extracted, compiled, and identified computationally, whereas in DNA microarrays, the identities of the probes are already known. In SAGE, comparison of gene expression profiles to discover differentially expressed genes and co-expressed genes is performed manually, whereas for microarrays, there are a large number of software algorithms to automate the process.