Multiple choice question for engineering
1. Which of the following is not among the methods for finding localized sequence similarity?
a) Profile Analysis
b) Block Analysis
c) Extraction of Blocks from a Global or Local MSA
d) Pattern blocking
Answer: d [Reason:] Pattern Searching is the correct name of the method for finding localized sequence similarity. This type of analysis was performed on groups of related proteins, and the amino acid patterns that were located may be found in the Prosite catalog.
2. Profiles are found by performing the _____ MSA of a group of sequences and then removing the _______ regions in the alignment into a smaller MSA.
a) local, more highly conserved
b) global, low conserved
c) global, more highly conserved
d) local, low conserved
Answer: c [Reason:] Profiles are found by performing the global MSA of a group of sequences and then removing the more highly conserved regions in the alignment into a smaller MSA. A scoring matrix for the MSA, called a profile, is then made. The profile is composed of columns much like a mini-MSA and may include matches, mismatches, insertions, and deletions.
3. The program Profilemake can be used to produce a profile from a MSA
Answer: a [Reason:] A version of the Profilesearch program, which performs a database search for matches to a profile, is available at the University of Pittsburgh Supercomputer Center. A special grant application may be needed to use this facility. Profile-generating programs are available by FTP and are included in the Genetics Computer Group suite of programs.
4. Which of the following is untrue regarding the block analysis method?
a) Blocks represent a conserved region in the MSA
b) Blocks differ from profiles in lacking insert and delete positions in the sequences
c) Every column includes only matches and mismatches
d) Blocks may be made by searching for a section of an MSA alignment that is low conserved
Answer: d [Reason:] Like profiles, blocks may be made by searching for a section of an MSA alignment that is highly conserved. However, aligned regions may also be found by searching each sequence in turn for similar patterns of the same length. These patterns may include a region with one or a few matching characters followed by a short spacer region of unmatched characters and then by another set of a few matching characters, and so on, until the sequences start to be different.
5. Block analysis methods use substitution matrices such as the PAM and BLOSUM matrices to score matches.
Answer: b [Reason:] these methods do not use substitution matrices such as the PAM and BLOSUM matrices to score matches. Rather, they are based on finding exact matches that have the same spacing in at least some of the input sequences, and that may be repeated in a given sequence.
6. In the method of extraction of blocks from a global or local MSA, a global MSA of related protein sequences usually includes regions that have been aligned without gaps in any of the sequences.
Answer: a [Reason:] These ungapped patterns may be extracted from these aligned regions and used to produce blocks. Blocks found in this manner are only as good as the MSA from which they are derived. A global MSA of related protein sequences usually includes regions that have been aligned without gaps in any of the sequences.
7. Which of the following is not true regarding the BLOCKS?
a) Blocks of width 10–55 are extracted from a protein MSA
b) The protein MSA is up to 400 sequences
c) The program doesn’t accept manually reformatted MSAs
d) The program accepts FASTA format
Answer: c [Reason:] The program accepts FASTA, CLUSTAL, or MSF formats, or manually reformatted MSAs. Several types of analyses may be performed with such extracted blocks. The BLOCKS server primarily generates blocks from unaligned sequences. The eMOTIFs server similarly extracts motifs from MSAs in several MSA formats and provides a formatter for additional MSA formats.
8. The pattern searching method type of analysis was performed on groups of related proteins, and the amino acid patterns that were located may be found in the Prosite catalog.
Answer: a [Reason:] This Prosite catalog groups proteins that have similar biochemical functions on the basis of amino acid patterns such as those in the active site. Subsequently, these families were searched for amino acid patterns by the MOTIF program (Smith et al. 1990), which finds patterns of the type aa1 d1 aa2 d2 aa3, where aa1 and aa2 are conserved amino acids and d1 and d2 are stretches of intervening sequence up to 24 amino acids long.
9. Although MOTIF program is used successfully for making the BLOCKS database, it is limited in the pattern sizes that can be found.
Answer: a [Reason:] The MOTIF program distinguishes true motifs from random background patterns by requiring that motifs occur in a number of the input sequences and tend not to be internally repeated in any one sequence. As the length of the motif increases, there are many possible combinations of patterns of a given length where only a few characters match.
10. Which of the following is not true regarding the BLOCKS?
a) The BLOCKS server can extract a conserved, ungapped region from a MSA to produce a sequence block
b) The server can also find blocks in a set of unaligned, input sequences and maintains a large database of blocks based on an analysis of proteins in the Prosite catalog
c) Blocks are found by the Protomat program
d) The program MOTIF doesn’t locate spaced patterns
Answer: d [Reason:] Blocks are found in two steps: First, the program MOTIF described on the previous page is used to locate spaced patterns. The second step takes the best and most consistent patterns found in step 1 and uses the program MOTOMAT to merge overlapping triplets and extend them, orders the resulting blocks, and chooses those that are in the largest subset of sequences.
1. Which of the following is wrong statement about the maximum likelihood approach?
a) This method doesn’t always involve probability calculations
b) It finds a tree that best accounts for the variation in a set of sequences
c) The method is similar to the maximum parsimony method
d) The analysis is performed on each column of a multiple sequence alignment
Answer: a [Reason:] This method involve probability calculations to find a tree that best accounts for the variation in a set of sequences. All possible trees are considered. Hence, the method is only feasible for a small number of sequences.
2. In about the maximum likelihood approach, for each tree, the number of sequence changes or mutations that may have occurred to give the sequence variation is considered.
Answer: a [Reason:] Because the rate of appearance of new mutations is very small, the more mutations needed to fit a tree to the data, the less likely that tree (Felsenstein 1981). The maximum likelihood method resembles the maximum parsimony method in that trees with the least number of changes will be the most likely.
3. The maximum likelihood method can be used to explore relationships among more diverse sequences, conditions that are not well handled by maximum parsimony methods.
Answer: a [Reason:] The maximum likelihood method presents an additional opportunity to evaluate trees with variations in mutation rates in different lineages. Also it provides opportunity to use explicit evolutionary models such as the Jukes-Cantor and Kimura models with allowances for variations in base composition.
4. The main disadvantage of maximum likelihood methods is that they are _____
a) mathematically less folded
b) mathematically less complex
c) computationally lucid
d) computationally intense
Answer: d [Reason:] The main disadvantage of maximum likelihood methods is that they are computationally intense. However, with faster computers, the maximum likelihood method is seeing wider use and is being used for more complex models of evolution.
5. Maximum likelihood has also been used for an analysis of mutations in overlapping reading frames in viruses.
Answer: a [Reason:] PAUP version 4 can be used to perform a maximum likelihood analysis on DNA sequences. The method has also been applied for changes from one amino acid to another in protein sequences.
6. Which of the following is wrong statement about DNAML and DNAMLK?
a) PHYLIP includes mentioned two programs for this maximum likelihood analysis
b) DNAML estimates phylogenies from nucleotide sequences by the maximum likelihood method
c) DNAMLK estimates phylogenies in the same manner as DNAML
d) DNAMLK estimates phylogenies without molecular clock
Answer: d [Reason:] DNAMLK estimates phylogenies from nucleotide sequences by the maximum likelihood method in the same manner as DNAML, but assumes a molecular clock. DNAML allows for variable frequencies of the four nucleotides, for unequal rates of transitions and transversions, and for different rates of change in different categories of sites, as specified by the program.
7. Which of the following is wrong statement about the maximum likelihood method’s steps?
a) It starts with an evolutionary model of sequence change that provides estimates of rates of substitution of one base for another
b) In the beginning there is an evolutionary model of sequence change that provides estimates of transitions and transversions in a set of nucleic acid sequences
c) The rates of all possible substitutions are chosen so that the base composition differs
d) The set of sequences is then aligned
Answer: c [Reason:] The rates of all possible substitutions are chosen so that the base composition remains the same. The set of sequences is then aligned, and the substitutions in each column are examined for their fit to a set of trees that describe possible phylogenetic relationships among the sequences.
8. Once all positions in the sequence alignment have been examined, the likelihoods given by each column in the alignment for each tree are _____ to give the likelihood of the tree.
Answer: a [Reason:] Because these likelihoods are very small numbers, their logarithms are usually added to give the logarithm likelihood of each tree. The most likely tree given the data is then identified.
9. A method of sequence alignment based on a Model (Bishop and Thompson 1986) was introduced that predicts the manner in which DNA sequences change during evolution. Which of the following is wrong about it?
a) The basis of this method is to devise a scheme for introducing substitutions, insertions, and gaps into sequences
b) The basis of this method is to provide a probability that each of these changes occurs over certain periods of evolutionary time
c) Given each of these predicted changes, the method examines all the possible combinations of mutations to change one sequence into another
d) Multiple combinations are selected that will be the most likely over time
Answer: d [Reason:] One of these combinations will be the most likely one over time and that is selected. Once this combination has been determined, a sequence alignment and the distance between the sequences will be known.
10. A method of sequence alignment based on a Model (Bishop and Thompson 1986) was introduced that predicts the manner in which DNA sequences change during evolution. Which of the following is wrong about it?
a) This method is different from the Smith-Waterman local alignment algorithm
b) This method is quite similar to the Smith-Waterman local alignment algorithm
c) The underlying mutational theory is like those used to produce the PAM matrices for predicting changes in DNA and protein sequences
d) Sequences are predicted to change by a Markov process such that each mutation in the sequence is independent of previous mutations at that site or at other sites
Answer: b [Reason:] This method is different from the Smith-Waterman local alignment algorithm in identifying the most probable (maximum likelihood probability alignment) based on an evolutionary model of change in sequences, as opposed to a score based on observed substitutions in related proteins and a gap scoring system. An example for option d can be–a given nucleotide at any sequence position can mutate into another at the same rate or may not change at all during a period of evolutionary time.
1. Which of the following is untrue regarding the maximum parsimony method?
a) This method predicts the evolutionary tree
b) It minimizes the number of steps required to generate the observed variation in the sequences
c) The method is also sometimes referred to as the minimum evolution method
d) Only a pairwise sequence alignment is required to predict which sequence positions are likely to correspond
Answer: d [Reason:] A multiple sequence alignment is required to predict which sequence positions are likely to correspond. These positions will appear in vertical columns in the multiple sequence alignment. For each aligned position, phylogenetic trees that require the smallest number of evolutionary changes to produce the observed sequence changes are identified.
2. Which of the following is untrue regarding the maximum parsimony method?
a) The analysis steps are continued for every position in the sequence alignment
b) This method is used for large numbers of sequences
c) Those trees that produce the smallest number of changes overall for all sequence positions are identified
d) This method is used for sequences that are quite similar
Answer: b [Reason:] The algorithm followed is not particularly complicated, but it is guaranteed to find the best tree, because all possible trees relating a group of sequences are examined. For this reason, the method is quite time-consuming and is not useful for data that include a large number of sequences or sequences with a large amount of variation.
3. Which of the following is untrue regarding the programs for analysis of nucleic acid sequences?
a) DNAPARS treats gaps as a fifth nucleotide state.
b) DNAPENNY performs parsimonious phylogenies by branch-and-bound search
c) DNAPENNY can analyze sequences up to 11 or 12
d) Compatibility criterion is not involved in DNACOMP
Answer: d [Reason:] DNACOMP performs phylogenetic analysis using the compatibility criterion.
Rather than searching for overall parsimony at all sites in the multiple sequence alignment, this method finds the tree that supports the largest number of sites. This method is recommended when the rate of evolution varies among sites.
4. PROTPARS counts the minimum number of mutations to change a codon for the first amino acid into a codon for the second amino acid, but only scores those mutations in the mutational path that actually change the amino acid.
Answer: a [Reason:] PROTPARS is used For analysis of protein sequences. As mentioned, Silent mutations that do not change the amino acid are not scored on the grounds that they have little evolutionary significance.
5. Parsimony can give ______ information when rates of sequence change ____ in the
different branches of a tree that are represented by the sequence data.
a) misleading, vary
b) useful, change
c) misleading, are constant
d) sometimes contradicting, are constant
Answer: a [Reason:] These variations produce a range of branch lengths, long ones representing more extended periods of time and short ones representing shorter times. Although other columns in the sequence alignment that show less variation may provide the correct tree, the columns representing greater variation dominate the analysis.
6. Which of the following is untrue regarding the distance methods?
a) The sequence pairs that have the largest number of sequence changes between them are termed ‘neighbors’
b) On a tree, these sequences share a node or common ancestor position and are each joined to that node by a branch
c) It produces a phylogenetic tree of the group
d) It employs the number of changes between each pair in a group of sequences
Answer: a [Reason:] The goal of distance methods is to identify a tree that positions the neighbors correctly and that also has branch lengths which reproduce the original data as closely as possible. Finding the closest neighbors among a group of sequences by the distance method is often the first step in producing a multiple sequence alignment.
7. Which of the following is untrue regarding the distance methods?
a) The distance method was pioneered by Feng and Doolittle
b) A collection of programs by authors Feng and Doolittle will produce both an alignment and tree of a set of protein sequences
c) The program CLUSTALW uses the neighbor-joining distance method as a guide to multiple sequence alignments
d) Among the Programs of the PHYLIP package, DNADIST is not one of them
Answer: d [Reason:] DNADIST and PROTDIST are the Programs of the PHYLIP package that perform a distance analysis. They automatically read in a sequence in the PHYLIP in file format and automatically produce a file called outfile with a distance table.
8. Which of the following is untrue regarding the Distance analysis programs in PHYLIP?
a) FITCH estimates a phylogenetic tree assuming additivity of branch lengths
b) FITCH uses the Fitch-Margoliash method
c) FITCH assumes a molecular clock but KITSCH does not
d) NEIGHBOR estimates phylogenies using the neighbor-joining or unweighted pair group method with arithmetic mean (UPGMA)
Answer: c [Reason:] KITSCH assumes a molecular clock but FITCH does not. Also, in NEIGHBOR the neighbor-joining method does not assume a molecular clock and produces an unrooted tree. The UPGMA method assumes a molecular clock and produces a rooted tree.
9. Which of the following is untrue regarding the neighbor-joining method?
a) It is very much like the Fitch-Margoliash method
b) It is totally dissimilar than the Fitch-Margoliash method
c) It is especially suitable when the rate of evolution of the separate lineages under consideration varies
d) When the branch lengths of trees of known topology are allowed to vary in a manner that simulates varying levels of evolutionary change, it is most reliable method
Answer: b [Reason:] The neighbor-joining method is very much like the Fitch-Margoliash method except that the choice as to which sequences to pair is determined by a different algorithm. In the situation mentioned in option d, the neighbor-joining method and the Sattath and Taversky method, are the most reliable in predicting the correct tree.
10. Neighbor-joining chooses the sequences that should be joined to give the best leastsquares estimates of the branch lengths that most closely reflect the actual distances between the sequences.
Answer: a [Reason:] It is not necessary to compare all possible trees to find the least squares fit as in the Fitch-Margoliash method. The method pairs sequences based on the effect of the pairing on the sum of the branch lengths of the tree.
1. Iterative methods include repeatedly realigning subgroups of the sequences and then by aligning these subgroups into a local alignment of all of the sequences.
Answer: b [Reason:] Subgroups are aligned into a global alignment of all of the sequences. The objective is to improve the overall alignment score, such as a sum of pairs score. Selection of these groups may be based on the ordering of the sequences on a phylogenetic tree predicted in a manner similar to that of progressive alignment, separation of one or two of the sequences from the rest, or a random selection of the groups.
2. Which of the following is incorrect regarding PRRP?
a) The program PRRP uses iterative methods to produce an alignment
b) An initial pair-wise alignment is made to predict a tree
c) Only one cycle is performed
d) The whole process is repeated until there is no further increase in the alignment score
Answer: c [Reason:] As mentioned, an initial pair-wise alignment is made to predict a tree, the tree is used to produce weights for making alignments in the same manner as
MSA except that the sequences are analyzed for the presence of aligned regions that include gaps rather than being globally aligned, and these regions are iteratively recalculated to improve the alignment score. The best scoring alignment is then used in a new cycle of calculations to predict a new tree, new weights, and new alignments.
3. In the program DIALIGN, pairs of sequences are aligned to locate aligned regions that do not include gaps, much like continuous diagonals in a dot matrix plot.
Answer: a [Reason:] The program DIALIGN finds an alignment by a different iterative method. Pairs of sequences are aligned to locate aligned regions that do not include gaps, much like continuous diagonals in a dot matrix plot. Diagonals of various lengths are identified.
A consistent collection of weighted diagonals that provides an alignment which is a maximum sum of weights is then found.
4. The Genetic Algorithm method has been recently adapted for MSA(Multiple Sequence Alignment) by Corpet (1998)
Answer: b [Reason:] The genetic algorithm is a general type of machine-learning algorithm that has no direct relationship to biology and that was invented by computer scientists. The method has been recently adapted for MSA (Multiple Sequence Alignment) by Notredame and Higgins (1996) in a computer program package called SAGA (Sequence Alignment by Genetic Algorithm).
5. An approach for obtaining a higher-scoring MSA (Multiple Sequence Alignment) by rearranging an existing alignment uses a probability approach called simulated annealing.
Answer: a [Reason:] The program MSASA (Multiple Sequence Alignment by Simulated Annealing) starts with a heuristic MSA (Multiple Sequence Alignment). Further, it changes the alignment by following an algorithm designed to identify changes that increase the alignment score.
6. The first step in Genetic Algorithm is arranging the sequences to be aligned in rows
Answer: a [Reason:] The sequences to be aligned are written in rows, as on a page, except that they are made to overlap by a random amount of sequence, up to 50 residues long for sequences about 200 in length. The ends are then padded with gaps. A typical population of 100 of these MSAs is made, although other numbers may be set.
7. The second step in the Genetic Algorithm comprises of scoring of the 100 initial MSAs by the sum of pairs method.
Answer: a [Reason:] The 100 initial MSAs are scored by the sum of pairs method, except that both natural and quasi-natural gap-scoring schemes are used. Recall that the best SSP score for a MSA is the minimum one and the one that is closest to the sum of the pair-wise sequence alignment. Standard amino acid scoring matrices and gap opening and extension penalties are used.
8. In Genetic Algorithm, in the mutation process _______
a) sequence is changed
b) gaps are not inserted
c) sequence is not changed
d) gaps are not rearranged
Answer: c [Reason:] In the mutation process, the sequence is not changed (else it would no longer be an alignment), but gaps are inserted and rearranged in an attempt to create a better-scoring MSA. In the gap insertion process, the sequences in a given MSA are divided into two groups based on an estimated phylogenetic tree, and gaps of random length are inserted into random positions in the alignment.
9. The HMM is a statistical model that considers few combinations of matches and gaps to generate an alignment of a set of sequences.
Answer: b [Reason:] The HMM is a statistical model that considers all possible combinations of matches, mismatches, and gaps to generate an alignment of a set of sequences. A localized region of similarity, including insertions and deletions, may also be modeled by an HMM. Analysis of sequences by an HMM is discussed on page 185 along with other statistical methods.
10. Which of the following is not true about iterative methods?
a) Genetic Algorithm is method used for under this
b) Hidden Markov Models are used for Multiple Sequence Alignment
c) The objective is to improve the overall alignment score
d) MultAlin recalculates global scores
Answer: d [Reason:] MultAlin (Corpet 1988) recalculates pair-wise scores during the production of a progressive Alignment. In addition, it uses these scores to recalculate the tree, which is then used to refine the alignment in an effort to improve the score.
1. The overall goal of pair wise sequence alignment is to find the best pairing of two sequences, such that there is maximum correspondence among residues.
Answer: a [Reason:] The goal of pair wise sequence alignment is to find the best pairing and to achieve this goal; one sequence needs to be shifted relative to the other to find the position where maximum matches are found. There are two different alignment strategies that are often used: global alignment and local alignment.
2. In local alignment, the two sequences to be aligned cannot be of unequal lengths.
Answer: b [Reason:] The two sequences to be aligned can be of different lengths. This approach is more appropriate for aligning divergent biological sequences containing only modules that are similar, which are referred to as domains or motifs. This approach can be used for aligning more divergent sequences with the goal of searching for conserved patterns in DNA or protein sequences.
3. Alignment algorithms, both global and local, are fundamentally similar and only differ in the optimization strategy used in aligning similar residues.
Answer: a [Reason:] Both types of algorithms can be based on one of the three methods: the dot matrix method, the dynamic programming method, and the word method. The word method is used in fast database similarity searching.
4. In a dot matrix, two sequences to be compared are written in the _____________ of the matrix.
a) horizontal and vertical axes
b) 2 parallel horizontal axes
c) 2 parallel vertical axes
d) horizontal axis (one preceding another)
Answer: a [Reason:] The comparison is done by scanning each residue of one sequence for similarity with all residues in the other sequence. If a residue match is found, a dot is placed within the graph. Otherwise, the matrix positions are left blank.
5. When the two sequences have substantial regions of similarity, many dots line up to form contiguous _______ lines.
a) crossings on
Answer: c [Reason:] The dots line up to form contiguous diagonal lines, which reveal the sequence alignment. If there are interruptions in the middle of a diagonal line, they indicate insertions or deletions. Parallel diagonal lines within the matrix represent repetitive regions of the sequences.
6. A problem exists when comparing _____ sequences using the dot matrix method, namely, the _______
a) small, amplification
b) large, amplification
c) small, high noise level
d) large, high noise level
Answer: d [Reason:] In most dot plots, dots are plotted all over the graph obscuring identification of the true alignment. For DNA sequences, the problem is particularly acute because there are only four possible characters in DNA and each residue therefore has a one-in-four chance of matching a residue in another sequence.
7. If the selected window size is too long, sensitivity of the alignment is lost.
Answer: a [Reason:] Dots are only placed when a stretch of residues equal to the window size from one sequence matches completely with a stretch of another sequence. This method has been shown to be effective in reducing the noise level. The window is also called a tuple, the size of which can be manipulated so that a clear pattern of sequence match can be plotted. However, if the selected window size is too long, sensitivity of the alignment is lost.
8. A sequence can be aligned with itself to identify internal repeat elements.
Answer: a [Reason:] In the self comparison, there is a main diagonal for perfect matching of each residue. If repeats are present, short parallel lines are observed above and below the main diagonal.
9. Self complementarity of DNA sequences cannot be identified using a dot plot.
Answer: b [Reason:] Self complementarity of DNA sequences, also called inverted repeats can be identified using a dot plot. For example, those that forms the stems of a hairpin structure. In this case, a DNA sequence is compared with its reverse-complemented sequence.
Parallel diagonals represent the inverted repeats.
10. Which of the following is untrue about dot plot method and its applications?
a) This method gives a direct visual statement of the relationship between two sequences
b) One of its advantages is identification of sequence repeat regions based on the presence of parallel diagonals of the same size vertically or horizontally in the matrix
c) It is not useful in identifying chromosomal repeats
d) The method can be used in identifying nucleic acid secondary structures through detecting self-complementarity of a sequence
Answer: c [Reason:] It is useful in identifying chromosomal repeats and in comparing gene order conservation between two closely related genomes. The dot matrix method gives a direct visual statement of the relationship between two sequences and helps easy identification of the regions of greatest similarities. The method thus has some applications in genomics.