Multiple choice question for engineering
1. Progressive alignment methods use the dynamic programming method to build a MSA starting with the most related sequences and then progressively adding less related sequences or groups of sequences to the initial alignment
Answer: a [Reason:] The progressive alignment methods use the dynamic programming method. Relationships among the sequences are modeled by an evolutionary tree in which the outer branches or leaves are the sequences. The tree is based on pair-wise comparisons of the sequences using one of the phylogenetic methods.
2. Progenitor sequences represented by the ______ branches of the tree are derived by alignment of the _______ sequences.
a) outer, outermost
b) inner, outermost
c) inner, innermost
d) outer, innermost
Answer: b [Reason:] Progenitor sequences represented by the inner branches of the tree are derived by alignment of the outermost sequences. These inner branches will have uncertainties where positions in the outermost sequences are dissimilar.
3. CLUSTALW is a more recent version of CLUSTAL with the W standing for ________
Answer: c [Reason:] The W in CLUSTALW stands for ‘weighting’ to represent the ability of the program to provide weights to the sequence and program parameters. CLUSTAL has been around for more than 10 years and lots of improvements in the program have been made.
4. The CLUSTALX provides a graphic interface.
Answer: a [Reason:] Two examples of programs that use progressive methods are CLUSTALW and the Genetics Computer Group program PILEUP. CLUSTALX provides a graphic interface.
These changes provide more realistic alignments that should reflect the evolutionary changes in the aligned sequences and the more appropriate distribution of gaps between conserved domains.
5. Which of the following is untrue about CLUSTAL program?
a) CLUSTAL performs a global-multiple sequence alignment by a different method than MSA (Multiple Sequence Alignment)
b) The initial heuristic alignment obtained by MSA is calculated in a different way
c) The initial step includes performing pair-wise alignments of all of the sequences
d) The intermediate step includes use the alignment scores to produce a phylogenetic tree
Answer: b [Reason:] The initial heuristic alignment obtained by MSA is calculated the same way, although it performs a global-multiple sequence alignment by a different method than MSA (Multiple Sequence Alignment). As the mentioned options are first two steps, the last is aligning the sequences sequentially, guided by the phylogenetic relationships indicated by the tree.
6. The initial alignments used to produce the guide tree may be obtained by various methods. Which of the following is not one of them?
a) Fast k-tuple
b) pattern-finding approach similar
d) Faster, full dynamic programming method
Answer: d [Reason:] The methods used, might be fast k-tuple or pattern-finding approach similar to FASTA that is useful for many sequences and the full dynamic programming method as well. But the option d becomes incorrect as full dynamic programming method is slower as compared to rest of the methods in options.
7. The scoring of gaps in a MSA (Multiple Sequence Alignment) has to be performed in a different manner from scoring gaps in a pair-wise alignment
Answer: a [Reason:] As more sequences are added to a profile of an existing MSA, gaps accumulate and influence the alignment of further sequences. CLUSTALW calculates gaps in a novel way designed to place them between conserved domains.
8. Like other alignment programs, CLUSTAL uses a null score for opening a gap in a sequence alignment and a penalty for extending the gap by one residue.
Answer: b [Reason:] CLUSTAL uses a penalty for opening a gap in a sequence alignment and an additional penalty for extending the gap by one residue. These penalties are user-defined. Gaps found in the initial alignments remain fixed. New gaps introduced as more sequences are added also receive this same gap penalty, even when they occur within an existing gap, but the gap penalties for an alignment are then modified according to the average match value in the substitution matrix, the percent identity between the sequences, and the sequence lengths.
9. Which of the following is untrue about PILEUP program?
a) It is the MSA program that is a part of the Genetics Computer Group package of sequence analysis programs
b) It is owned since 1997 by Oxford Communications, and is widely used due to the popularity and availability of this package
c) It uses a method for MSA that is polar opposite to CLUSTALW
d) The sequences are aligned pair-wise using the Needleman- Wunsch dynamic programming algorithm
Answer: c [Reason:] PILEUP uses a method for MSA that is very similar to CLUSTALW. The sequences are aligned pair-wise using the Needleman- Wunsch dynamic programming algorithm, and the scores are used to produce a tree by the unweighted pair-group method using arithmetic averages. The resulting tree is then used to guide the alignment of the most closely related sequences and groups of sequences. The resulting alignment is a global alignment produced by the Needleman-Wunsch algorithm.
10. The resulting tree is then used to guide the alignment of the most closely related sequences and groups of sequences. The resulting alignment is a global alignment produced by the Needleman-Wunsch algorithm.
Answer: a [Reason:] The very first sequences to be aligned are the most closely related on the sequence tree. If these sequences align very well, there will be few errors in the initial alignments. However, the more distantly related these sequences, the more errors will be made, and these errors will be propagated to the MSA. There is no simple way to circumvent this problem. A second problem with the progressive alignment method is the choice of suitable scoring matrices and gap penalties that apply to the set of sequences.
1. Phylogenetic analysis of a set of sequences that aligns ______ is straightforward because the positions that correspond in the sequences can be readily identified in a ______ of the sequences.
a) very well, multiple sequence alignment
b) in a haphazard manner, multiple sequence alignment
c) in a distorted way, multiple sequence alignment
d) very well, self alignment
Answer: a [Reason:] Option d, here, becomes irrelevant as there is phylogenetic analysis involved. The types of changes in the aligned positions or the numbers of changes in the alignments between pairs of sequences then provide a basis for a determination of phylogenetic relationships among the sequences by the above methods of phylogenetic analysis.
2. For sequences that have ______, a phylogenetic analysis is_______
a) diverged considerably, more challenging
b) not diverged, more challenging
c) diverged considerably, less challenging
d) diverged considerably, a less work to do
Answer: a [Reason:] Clearly, option a and b contradict. For diverged sequences, the analysis steps increase as well. A determination of the sequence changes that have occurred is more difficult because the multiple sequence alignment may not be optimal and because multiple changes may have occurred in the aligned sequence positions.
3. The choice of a suitable multiple sequence alignment method depends on the degree of variation among the sequences.
Answer: a [Reason:] The degree of variation also sometimes affecs the efficiency and nature of output. Once a suitable alignment has been found, one may also ask how well the predicted phylogenetic relationships are supported by the data in the multiple sequence alignment.
4. In the bootstrap method, the data are resampled by _____ choosing _____ columns from the aligned sequences to produce, in effect, a new sequence alignment of the_____
a) randomly, horizontal, same length
b) specifically, vertical, different lengths
c) randomly, vertical, same length
d) randomly, vertical, different lengths
Answer: c [Reason:] Each column of data may be used more than once and some columns may not be used at all in the new alignment. Trees are then predicted from many of these alignments of resampled sequences (Felsenstein 1988).
5. In the bootstrap method, for branches in the predicted tree topology to be significant, the resampled data sets should frequently predict the same branches.
Answer: a [Reason:] Bootstrap analysis is supported by most of the commonly used phylogenetic inference software packages and is commonly used to test tree branch reliability. Another method of testing the reliability of one part of the tree is to collapse two branches into a common node (Maddison and Maddison 1992).
6. In the final steps of the bootstrap method, the _____ the decay value, the ___ significant the original branches.
a) greater, less
b) greater, more
c) lesser, more
d) more, less
Answer: b [Reason:] In the final steps of the bootstrap method,The tree length is again evaluated and compared to the original length, and any increase is the decay value. The greater the decay value, the more significant the original branches. In addition to these methods, there are some additional recommendations that increase confidence in a phylogenetic prediction.
7. A common recommendation is to use at least two of the methods—maximum parsimony, distance, or maximum likelihood, for the analysis.
Answer: a [Reason:] If two of these methods provide the same prediction, confidence in the prediction is much higher. Another recommendation is to pay careful attention to the evolutionary assumptions and models that are used for both sequence alignment and tree construction.
8. The traditional use of phylogenetic analysis is to discover evolutionary relationships among species.
Answer: a [Reason:] In such cases, a suitable gene or DNA sequence that shows just enough, but not too much, variation among a group of organisms is selected for phylogenetic analysis. For example, analysis of mitochondrial sequences is used to discover evolutionary relationships among mammals.
9. Two more recent uses of phylogenetic analysis are to analyze ______ and to trace the evolutionary history of specific genes. Which of the following could not be the correct blank?
a) gene families
d) physical separation methods
Answer: d [Reason:] Option, here, indicates the laboratory operations unlike the computational data mentioned in other options. For example, database similarity searches may identify several proteins in a plant genome that are similar to a yeast query protein.
10. Tracking the evolutionary history of individual genes in a group of species can reveal which genes have remained in a genome for a long time and which genes have been horizontally transferred between species.
Answer: a [Reason:] Thus, phylogenetic analysis can also contribute to an understanding of genome evolution. Or e.g. from a phylogenetic analysis of the protein family, the plant gene most closely related to the yeast gene and therefore most likely to have the same function can be determined.
1. At present, there are essentially two types of method of RNA structure prediction.
One is minimum free energy approach and the Second one is a comparative approach.
Answer: a [Reason:] One is based on the calculation of the minimum free energy of the stable structure derived from a single RNA sequence. This can be considered an ab initio approach. The second is a comparative approach which infers structures based on an evolutionary comparison of multiple related RNA sequences.
2. Ab initio approach makes structural predictions based on ______
a) a single RNA sequence
b) comparing RNA sequences
c) evolutionary basis
d) pure phylogenetics
Answer: a [Reason:] The rationale behind this method is that the structure of an RNA molecule is solely determined by its sequence. Thus, algorithms can be designed to search for a stable RNA structure with the lowest free energy.
3. In ab initio approach, generally, when a base pairing is formed, the energy of the molecule is ____ because of attractive interactions between the two strands.
d) kept stable
Answer: a [Reason:] Here, the algorithms can be designed to search for a stable RNA structure with the lowest free energy. Thus, to search for a most stable structure, ab initio programs are designed to search for a structure with the maximum number of base pairs.
4. In ab initio methods, free energy can be calculated based on parameters empirically derived for small molecules.
Answer: a [Reason:] G–C base pairs are more stable than A–U base pairs, which are more stable than G–U base pairs. It is also known that base-pair formation is not an independent event.
5. The energy necessary to form individual base pairs is not quite affected by adjacent base pairs.
Answer: b [Reason:] The energy necessary to form individual base pairs is influenced by adjacent base pairs through helical stacking forces. This is known as co-operativity in helix formation.
6. The attractive interactions lead to ____ energy.
d) no change in
Answer: c [Reason:] If a base pair is next to other base pairs, the base pairs tend to stabilize each other through attractive stacking interactions between aromatic rings of the base pairs. The attractive interactions lead to even lower energy. Parameters for calculating the co-operativity of the base-pair formation have been determined and can be used for structure prediction.
7. If the base pair is adjacent to loops or bulges, the neighboring loops and bulges tend to ______ the base-pair formation.
a) have no change on
b) decrease the energy
Answer: d [Reason:] This is because there is a loss of entropy when the ends of the helical structure are constrained by unpaired loop residues. The destabilizing force to a helical structure also depends on the types of loops nearby.
8. The scoring scheme based on the combined stabilizing and destabilizing interactions forms the foundation of the ab initio RNA secondary structure prediction method.
Answer: a [Reason:] Parameters for calculating different destabilizing energies have also been determined and can be used as penalties for secondary structure calculations. This method works by first finding all possible base-pairing patterns from a sequence and then calculating the total energy of a potential secondary structure by taking into account all the adjacent stabilizing and destabilizing forces.
9. Ab initio methods are energetically least favorable.
Answer: b [Reason:] These methods are energetically most favorable. If there are multiple alternative secondary structures, the method finds the conformation with the lowest energy.
10. The dot matrix method and the dynamic programming method can be used in detecting self-complementary regions of a sequence.
Answer: a [Reason:] A simple dot matrix can find all possible base-paring patterns of an RNA sequence when one sequence is compared with itself. In this case, dots are placed in the matrix to represent matching complementary bases instead of identical ones.
1. Sequencing of genomes depends on the assembly of a large number of DNA reads into a linear, contiguous DNA sequence.
Answer: a [Reason:] The cost and efficiency of this process has been greatly improved by automatic methods of sequence assembly, first used for the sequencing of the bacterium H. influenza. This same method of assembly was also used, in part, to complete the sequencing of the Drosophila and human genomes in a timely manner.
2. Each genome sequence is scanned for protein-encoding genes using gene models trained on known gene sequences from the same organism.
Answer: a [Reason:] For a new genome, each predicted gene is translated into a protein sequence; the collection of protein sequences encoded by the genome is the proteome of the organism. every protein in the proteome is then used as a query sequence in a database similarity search. Matching database sequences are realigned with the query sequence to evaluate the extent and significance of the alignment.
3. Screening the predicted protein sequences against ______ library confirms the prediction and expression of the gene.
a) expressed sequence tag (EST)
Answer: a [Reason:] The collective information on proteome function can then be further analyzed by self-comparison to find duplicated genes (paralogs) and by a proteome-by-proteome comparison to identify orthologs, genes that have maintained the same function through speciation, and other sequence and evolutionary relationships that are important for metabolic, regulatory, and cellular functions.
4. In case of genome sequence assembly which of the given statement is incorrect?
a) Full chromosomal sequences are assembled from the overlaps in a highly redundant set of fragments by an automatic computational method or from the fragment order on a physical map
b) Chromosome cloning is carried out in bacterial artifical chromosomes (BACs)
c) Chromosomes of a target organism are purified, fragmented, and subcloned in fragments of size hundreds of bp
d) Genome sequences are assembled from DNA sequence fragments of approximate length 500 bp obtained using DNA sequencing machines
Answer: c [Reason:] Chromosomes of a target organism are purified, fragmented, and subcloned in fragments of size hundreds of kbp and not bp. The BAC fragments are then further subcloned as smaller fragments into plasmid vectors for DNA sequencing.
5. TEs (transposable elements) can at most comprise one-fourth of the genome sequence.
Answer: b [Reason:] TEs (transposable elements) can comprise one-half or more of the genome sequence. Eukaryotic genomes comprise classes of repeated elements, including tandem repeats present in centromeres and telomeres, dispersed tandem repeats (minisatellites and macrosatellites), and interdispersed TEs.
6. Gene identification in prokaryotic organisms is simplified by their lacking _____
c) coding segments
d) useful nucleotide sequences
Answer: b [Reason:] Once the sequence patterns that are characteristic of the genes in a particular prokaryotic organism (e.g, codon usage, codon neighbor preference) have been found, gene locations in the genome sequence can be predicted quite accurately. The presence of introns in eukaryotic genomes makes gene prediction more involved because, in addition to the above features, locations of intron–exon and exon–intron splice junctions must also be predicted.
7. Which of the given statement is incorrect?
a) The predicted set of proteins for the genome is referred to as the proteome
b) The amino acid sequence of proteins encoded by the predicted genes is used as a query of the protein sequence databases in a database similarity search
c) A match of a predicted protein sequence to one or more database sequences serves only to identify the gene function but it doesn’t validate the gene prediction
d) The genome sequence is annotated with the information on gene content and predicted structure, gene location, and functional predictions
Answer: c [Reason:] A match of a predicted protein sequence to one or more database sequences not only serves to identify the gene function, but also validates the gene prediction. Pseudogenes, gene copies that have lost function, may also be found in this analysis.
8. Which of the following information is not directly obtained by microarray analysis?
a) Which genes are expressed at a particular stage of the cell cycle
b) Which genes are expressed at a particular stage of developmental cycle of an organism
c) Which genes are depleted at what time
d) Genes that respond to a given environmental signal to the same extent
Answer: c [Reason:] For chronological information there are other numerous techniques that can be followed. This type of information provides an indication as to which genes share a related biological function or may act in the same biochemical pathway and may thereby give clues that will assist in gene identification.
9. Which of the given statement is incorrect about Functional Genomics?
a) Functional genomics involves the preparation of mutant or transgenic organisms with a mutant form of a particular gene usually designed to prevent expression of the gene
b) An abnormal properties of the mutant organism does not reveal the gene function
c) When two or more members of a gene family are found ,rather than a single match to a known gene, the biological activity of these members may be analyzed by functional genomics to look for diversification of function in the family
d) A more detailed analysis of the relative amount of sequence variability in a chromosomal region within populations of closely related species can reveal the presence of genes that are under selection
Answer: b [Reason:] The gene function is revealed by any abnormal properties of the mutant organism. This methodology provides a way to test a gene function that is predicted by sequence similarity to be the same as that of a gene of known function in another organism. If the other organism is very different biologically (comparing a predicted plant or animal gene to a known yeast gene), then functional genomics can also shed light on any newly acquired biological role.
10. Which of the given statement is incorrect about gene map?
a) Gene order in two related organisms reflects the order that was present in a common ancestor genome. Chromosomal breaks followed by a reassembly of fragments in a different order can produce new gene maps
b) Gene order is only revealed by the physical order of genes on the chromosome
c) Sequence variations (polymorphisms) that are close to (tightly linked) a trait may be used to trace the trait by virtue of the fact that the polymorphism and the trait are seldom separated from one generation to the next
d) These types of evolutionary changes in genomes have been modeled by computational methods
Answer: b [Reason:] Gene order is revealed not only by the physical order of genes on the chromosome, but also by genetic analysis. Populations of an organism show sequence variations that are readily detected by DNA sequencing and other analysis methods. The inheritance of genetic diseases in humans and animals (e.g., cancer and heart disease), and of desirable traits in plants, can be traced genetically by pedigree analysis or genetic crosses.
1. Which of the following is untrue regarding expressed sequence tags (ESTs)?
a) One of the high throughput approaches to genome-wide profiling of gene expression is sequencing ESTs
b) They are short sequences obtained from cDNA clones
c) They serve as short identifiers of full-length genes
d) They are typically in the range of 800 to 900 nucleotides in length
Answer: d [Reason:] ESTs are typically in the range of 200 to 400 nucleotides in length obtained from either the 5’end or 3’end of cDNA inserts. Libraries of cDNA clones are prepared through reverse transcription of isolated mRNA populations by using oligo (dT) primers that hybridize with the poly (A) tail of mRNAs and ligation of the cDNAs to cloning vectors.
2. To generate EST data, clones in the cDNA library are randomly selected for sequencing from either end of the inserts.
Answer: a [Reason:] The EST data are able to provide a rough estimate of genes that are actively expressed in a genome under a particular physiological condition. This is because the frequencies for particular ESTs reflect the abundance of the corresponding mRNA in a cell, which corresponds to the levels of gene expression at that condition. Another potential benefit of EST sampling is that, by randomly sequencing cDNA clones, it is possible to discover new genes.
3. Which of the following is untrue regarding the drawbacks of expressed sequence tags (ESTs)?
a) They are often of lowquality because they are automatically generated without verification
b) Many bases are ambiguously determined, represented by N’s
c) Frame shift errors and artifactual stop codons are some common errors
d) Despite of all the failures, the translation the sequences is smooth
Answer: d [Reason:] Common errors also include frameshift errors and artifactual stop codons, resulting in failures of translating the sequences. In addition, there is often contamination by vector sequence, introns (fromunspliced RNAs), ribosomal RNA (rRNA), mitochondrial RNA, among others. ESTs represent only partial sequences of genes.
4. It has been estimated that up to 11% of cDNA clones may be chimeric.
Answer: a [Reason:] A problem of ESTs is the presence of chimeric clones owing to cloning artifacts in library construction, in which more than one transcript is ligated in a clone resulting in the 5_ end of a sequence representing one gene and the 3’ end another gene. Another fundamental problem with EST profiling is that it predominantly represents highly expressed, abundant transcripts. Weakly expressed genes are hardly found in a EST sequencing survey.
5. Which of the following is untrue regarding expressed sequence tags (ESTs)?
a) EST libraries can be easily generated from various cell lines, tissues, organs, and at various developmental stages
b) Although individual ESTs are prone to error, an entire collection of ESTs contains valuable information
c) Identification of cDNA clone is difficult
d) ESTs can also facilitate the unique identification of a gene from a cDNA library
Answer: c [Reason:] a short tag can lead to a cDNA clone. Often, after consolidation of multiple EST sequences, a full-length cDNA can be derived. By searching a non-redundant EST collection, one can identify potential genes of interest.
6. GenBank has a special EST database, dbEST that contains EST collections for a large number of organisms.
Answer: a [Reason:] The rapid accumulation of EST sequences has prompted the establishment of public and private databases to archive the data. The mentioned database is regularly updated to reflect the progress of various EST sequencing projects. Each newly submitted EST sequence is subject to a database search. If a strong similarity to a known gene is found, it is annotated accordingly.
7. Which of the following is untrue regarding EST Index Construction?
a) The goal of the EST databases is to organize and consolidate the largely redundant EST data
b) The process includes a preprocessing step that removes masks repeats
c) There is no screening of vector contaminants
d) The goal of the EST databases is to improve the quality of the sequence information so the data can be used to extract full-length cDNAs
Answer: c [Reason:] The process includes a preprocessing step that removes vector contaminants and masks repeats. Vecscreen, can be used to screen out bacterial vector sequences. This is followed by a clustering step that associates EST sequences with unique genes.
8. Which of the following is untrue regarding UniGene?
a) It is an NCBI EST cluster database.
b) Overlapping EST sequences are computationally processed to represent a single expressed gene.
c) Each cluster is a set of overlapping EST sequences
d) The overlapping EST sequences are computationally processed to represent a set of expressed genes
Answer: d [Reason:] The database is constructed based on combined information from dbEST, GenBank mRNA database, and “electronically spliced” genomic DNA. Only ESTs with 3’poly-A ends are clustered to minimize the problem of chimerism. The resulting 3’EST sequences provide more unique representation of the transcripts.
9. Which of the following is untrue regarding TIGR Gene Indices?
a) It is an EST database that the similar type of clustering method from UniGene
b) It is an EST database that uses a different clustering method from UniGene
c) It compiles data from dbEST, GenBank mRNA and genomic DNA data, and TIGR’s own sequence databased) Sequences are only clustered if they are more than 95% identical for over a fortynucleotide region in pairwise comparisons
Answer: a [Reason:] BLAST and FASTA are used to identify sequence overlaps. In the sequence assembly stage, both TIGR Assembler andCAP3are used to construct contigs, producing a so-called tentative consensus (TC). To prevent chimerism, transcripts are clustered only if they match fully with known genes.
10. Which of the following is untrue regarding SAGE?
a) It stands for Serial analysis of gene expression
b) It is another high throughput, sequence-based approach for global gene expression profile analysis
c) It stands for Squared analysis of gene expression
d) Unlike EST sampling, SAGE is more quantitative in determining mRNA expression in a cell
Answer: c [Reason:] In this method, short fragments of DNA (usually 15 base pairs [bp]) are excised from cDNA sequences and used as unique markers of the gene transcripts. The sequence fragments are termed tags. They are subsequently concatenated (linked together), cloned, and sequenced.