Multiple choice question for engineering
1. Proteins are dynamic entities that undergo _______
a) only fluctuation of flexible loop regions about equilibrium positions when in solution
b) only limited conformational change of amino acid side-chains when in solution
c) both limited conformational change of amino acid side-chains and fluctuation of flexible loop regions about equilibrium positions when in solution
d) illimitable or total conformational change of amino acid side-chains when in solution
Answer: c [Reason:] Such flexibility can often be adequately treated by ‘soft’ potentials or limited conformational flexibility and/or refinement of side-chains on docking. However, proteins often undergo more extensive conformational changes which may involve large-scale motions of domains relative to one another or possibly conformational change involving order-disorder transitions.
2. Typical motions of proteins will be primarily treated by a rigid body model for docking.
Answer: b [Reason:] These types of motions will be poorly treated by a rigid body model for docking. Again distinction is often made between the general treatment of protein-protein docking and protein-ligand docking.
3. In case of Protein-ligand docking, ______ ligands are often _____ in adapting their shape to fit the receptor binding pocket.
a) small molecule, highly flexible
b) large molecule, highly flexible
c) large molecule, more flexible
d) small molecule, less flexible
Answer: a [Reason:] The degree of conformational flexibility of a small molecule ligand (substrate, cofactor or inhibitor) can be considerable particularly where there are multiple torsion angles. This presents a major challenge in protein-ligand docking and several different approaches have been used to solve this problem.
4. In Multiple conformation rigid-body method, a ligand is assumed to be able to adopt a number (N) of different _______ that are computed _____ the ligand being docked into the receptor.
a) low-energy conformations, after
b) low-energy conformations, prior to
c) high-energy conformations, prior to
d) high-energy conformations, after
Answer: b [Reason:] These, N, low-energy conformations are then docked individually into the receptor assuming a rigid conformation using a descriptor-based approach. The scoring function is used to determine which of the resulting solutions is optimal.
5. The disadvantage of Multiple conformation rigid-body method is that the active conformation may be missed as the result of a minor structural difference not considered in the ___ ligand conformations. Where, the N is the number of low-energy conformations.
a) N +1
Answer: b [Reason:] The disadvantage is that the active conformation may be missed as the result of a minor structural difference not considered in the N ligand conformations and The advantage of this approach is that the search can be restricted to a smaller number of relevant ligand conformations.
6. In Stochastic search methods, they include methods such as Monte Carlo simulation, simulated annealing, Tabu search, genetic algorithms and evolutionary programming.
Answer: a [Reason:] Stochastic processes use a random sampling procedure to search conformational space. The ligand molecule performs a random walk in space in the receptor cavity. Usually, the ligand is placed in a random orientation in the receptor cavity. Then at each step a small displacement is made in any of the degrees of freedom of the ligand molecule (translation, rotation or torsion angle).
7. In a Monte Carlo simulation the score (or energy) is calculated at each step and compared to the previous step. The probability of accepting the step is given by ______ where ΔE is the difference in energy; kB is Boltzmann’s constant and T the temperature.
Answer: [Reason:] If the new energy is lower the step is accepted, otherwise the result is treated probabilistically by a Boltzmann mechanism. If P (ΔE) is greater than a random number generated between 0 and 1 then the step is accepted. The higher the temperature (or the smaller ΔE at a given T) the higher the likelihood the step is accepted. A conventional Monte Carlo simulation proceeds at constant temperature, whilst in simulated annealing the temperature is gradually cooled during the simulation in an attempt to locate a globally optimal solution. In simulated annealing the computer stores a single solution and generates a new solution randomly.
8. Genetic methods (genetic algorithms and evolutionary programming) store multiple solutions. These solutions form a population of members.
Answer: a [Reason:] Each member has an associated score or fitness. During the search for the global optimal solution successive new populations are created by a procedure involving selection of the fittest members. These members then have offspring to create a new population. Differences arise in how the methods generate offspring. In a genetic algorithm two solutions are mated to form a new offspring solution. In evolutionary programming each member of the population generates an offspring by mutation.
9. Stochastic methods can guarantee reaching a global optimal solution and the methods are computationally costly in comparison to the other methods.
Answer: b [Reason:] The advantages of stochastic methods are that the ligand is able to explore conformational space in a relatively unconstrained way, frequently leading to the globally optimal solution. The disadvantages are that it cannot guarantee reaching a global optimal solution and the methods are computationally costly in comparison to the other methods.
10. Which of the following is true regarding Protein flexibility?
a) Methods have been described for the introduction of side-chain flexibility to both protein-ligand and protein-protein docking
b) Methods have been described for the introduction of side-chain flexibility to protein-ligand docking only
c) The Mean Field principle is rarely used in this
d) Methods have been described for the introduction of side-chain flexibility to protein-protein docking only
Answer: a [Reason:] The Mean Field approach is one type of what are called bounded search methods. Others include the Dead-end-elimination theorem and the A* algorithm. These methods use different approaches to find a solution and a detailed discussion is beyond the scope of this chapter. However, they use a multiple copy representation of protein side chains built using a rotamer library.
1. Which of the following is untrue about distance based methods?
a) The computed evolutionary distances can be used to construct a matrix of distances between all individual pairs of taxa
b) Clustering is the only method among the algorithms for the distance-based tree-building method
c) The clustering-type algorithms compute a tree based on a distance matrix starting from the most similar sequence pairs
d) Based on the pairwise distance scores in the matrix, a phylogenetic tree can be constructed for all the taxa involved
Answer: b [Reason:] The algorithms for the distance-based tree-building method can be subdivided into either clustering based or optimality based. These algorithms include an unweighted pair group method using arithmetic average (UPGMA) and neighbor joining. The optimality-based algorithms compare many alternative tree topologies and select one that has the best fit between estimated distances in the tree and the actual evolutionary distances.
2. Which of the following is untrue about the Unweighted Pair Group Method Using Arithmetic Average?
a) The simplest clustering method is UPGMA, which builds a tree by a sequential clustering method
b) Given a distance matrix, it starts by grouping two taxa with the largest pairwise distance in the distance matrix
c) The distances between this new composite taxon and all remaining taxa are calculated to create a reduced matrix
d) The grouping process is repeated and another newly reduced matrix is created
Answer: b [Reason:] It starts by grouping two taxa with the smallest pairwise distance in the distance matrix. A node is placed at the midpoint or half distance between them. It then creates a reduced matrix by treating the new cluster as a single taxon.
3. The basic assumption of the UPGMA method is that all taxa evolve at a constant rate and that they are equally distant from the root, implying that a molecular clock is in effect.
Answer: a [Reason:] However, real data rarely meet this assumption. Thus, UPGMA often produces erroneous tree topologies. However, owing to its fast speed of calculation, it has found extensive usage in clustering analysis of DNA microarray data.
4. In the Neighbor Joining step, The UPGMA method uses unweighted distances and assumes that all taxa have constant evolutionary rates.
Answer: a [Reason:] Since this molecular clock assumption is often not met in biological sequences, to build a more accurate phylogenetic trees, the neighbor joining (NJ) method can be used, which is somewhat similar to UPGMA in that it builds a tree by using stepwise reduced distance matrices. However, the NJ method does not assume the taxa to be equidistant from the root.
5. Corrects for unequal evolutionary rates between sequences by using a conversion step. This conversion requires the calculations of “r-values” and “transformed r-values” using the following formula ______
a) dAB’= dAB − 1/4 × (rA + rB)
b) dAB’= dAB − 1/2 × (rA + rB)
c) dAB’= dAB − 1/3 × (rA + rB)
d) dAB’= (dAB/3) − 1/2 × (rA + rB)
Answer: b [Reason:] AB is the converted distance between A and B and dAB is the actual evolutionary distance between A and B. The value of rsub>A (or rB) is the sum of distances of A (or B) to all other taxa.
6. A generalized expression of the r-value is ri calculated based on the following formula _______
a) ri = ∑dij + dj2
b) ri = ∑dij
c) ri = ∑dij + di
d) ri = ∑dij + dj
Answer: b [Reason:] i and j are two different taxa. The r-values are needed to create a modified distance matrix. The transformed r-values (r ‘) are used to determine the distances of an individual taxon to the nearest node: r i2= ri/ (n−2)
7. The tree construction process is somewhat similar to that used UPGMA.
Answer: b [Reason:] Rather than building trees from the closest pair of branches and progressing to the entire tree, the NJ tree method begins with a completely unresolved star tree by joining all taxa onto a single node and progressively decomposes the tree by selecting pairs of taxa based on the above modified pairwise distances. This allows the taxa with the shortest corrected distances to be joined first as a node.
8. Which of the following is untrue about the Optimality-Based Methods?
a) The clustering-based methods produce multiple trees as output
b) Optimality-based methods select a tree that best fits the actual evolutionary distance matrix
c) There is no criterion in judging how this tree is compared to other alternative trees
d) Optimality-based methods have a well-defined algorithm to compare all possible tree topologies
Answer: a [Reason:] The clustering-based methods produce a single tree as output. Based on the differences in optimality criteria, there are two types of algorithms, Fitch–Margoliash and minimum evolution, that are described next. The exhaustive search for an optimal tree necessitates a slow computation, which is a clear drawback especially when the dataset is large.
9. Which of the following is untrue about the Fitch–Margoliash?
a) Method selects a best tree among all possible trees based on minimal deviation between the distances calculated in the overall branches in the tree and the distances in the original dataset
b) It starts by randomly clustering two taxa in a node
c) It starts by creating three equations to describe the distances
d) The method searches for some specific tree topologies
Answer: d [Reason:] It solves the three algebraic equations for unknown branch lengths. The clustering of the two taxa helps to create a newly reduced matrix. This process is iterated until a tree is completely resolved. The method searches for all tree topologies and selects the one that has the lowest squared deviation of actual distances and calculated tree branch lengths.
10. Minimum evolution (ME) constructs a tree with a similar procedure, but uses a different optimality criterion that finds a tree among all possible trees with a minimum overall branch length. The optimality criterion relies on the formula S = ∑bi where bi is the (i)th branch length.
Answer: a [Reason:] Searching for the minimum total branch length is an indirect approach to achieving the best fit of the branch lengths with the original dataset. Analysis has shown that minimum evolution in fact slightly outperforms the least square-based FM method.
1. Which of the following is untrue about DNA sequencing methods?
a) Purified fragments of DNA cut from plasmid/phage clones or amplified by polymerase chain reaction (PCR)
b) Clones of DNA fragments are denatured to single strands, and one of the strands is hybridized to an oligonucleotide primer
c) Taq polymerase is quite heat sensitive
d) New strands of DNA are synthesized from the end of the primer
Answer: c [Reason:] In an automated procedure, new strands of DNA are synthesized from the end of the primer by heat-resistant Taq polymerase from a pool of deoxyribonucleotide triphosphates (dNTPs) that includes a small amount of one of four chain-terminating nucleotides (ddNTPs).
2. Using ddATP, the resulting synthesis creates a set of nested DNA fragments, each one ending at one of the as in the sequence through the substitution of a fluorescent-labeled ddATP.
Answer: a [Reason:] A similar set of fragments is made for each of the other three bases. But each set is labeled with a different fluorescent ddNTP.
3. The combined mixture of all labeled DNA fragments is electrophoresed to _____ the fragments by______ and the ladder of fragments is scanned for the presence of each of the four labels.
a) separate, size
b) separate, pH
c) assimilate, pH
d) assimilate, size
Answer: a [Reason:] A computer program then determines the probable order of the bands and predicts the sequence. Depending on the actual procedure being used, one run may generate a reliable sequence of as many as 500 nucleotides.
4. The sequence can also be verified by making an oligonucleotide primer complementary to the distal part of the readable sequence and using it to obtain the sequence of the complementary strand on the original DNA template.
Answer: a [Reason:] For accurate work, a printout of the scan is usually examined for abnormalities that decrease the quality of the sequence, and the sequence may then be edited manually. The first sequence can also be extended by making a second oligonucleotide matching the distal end of the readable sequence and using this primer to read more of the original template.
5. When the process is fully automated, a number of priming sites may be used to obtain sequencing results that give optimal separation of bands in each region of the sequence.
Answer: a [Reason:] By repeating this procedure, both strands of a DNA fragment several kilobases in length can be sequenced. Sequential sequencing of a DNA molecule using oligonucleotide primers is done later.
6. To sequence larger molecules, individual chromosomes are purified and broken into _____ or larger random fragments, which are cloned into vectors designed for large molecules.
Answer: b [Reason:] To sequence larger molecules, such as human chromosomes, individual chromosomes are purified and broken into 100-kb or larger random fragments, which are cloned into vectors designed for large molecules, such as artificial yeast (YAC) or bacterial (BAC) chromosomes. In a laborious procedure, the resulting library is screened for fragments called contigs, which have overlapping or common sequences, to produce an integrated map of the chromosome.
7. Many levels of clone redundancy may be required to build a consensus map because individual clones can have _______
c) two separate fragments
Answer: d [Reason:] Option d here becomes irrelevant as it has quite less relevancy to redundancy of the clones. These do not reflect the correct map and have to be eliminated.
8. Once the correct map has been obtained, unique overlapping clones are chosen for sequencing.
Answer: a [Reason:] However, these molecules are too large for direct sequencing. One procedure for sequencing these clones is to subclone them further into smaller fragments that are of sizes suitable for sequencing, make a map of these clones and then sequence overlapping clones. However, this method is expensive because it requires a great deal of time to keep track of all the subclones.
9. An alternative method is to sequence all the subclones, produce a computer database of the sequences, and then have the computer assemble the sequences from the overlaps that are found.
Answer: a [Reason:] Up to 10 levels of redundancy are used to get around the problem of a small fraction of abnormal clones. This procedure was first used to obtain the sequence of the 4- Mb chromosome of the bacterium Haemophilus influenzae by The Institute of Genetics Research (TIGR) team. Only a few regions could not be joined because of a problem subcloning those regions into plasmids, requiring manual sequencing of these regions from another library of phage subclones.
10. Which of the following is untrue about Shotgun Sequencing?
a) When DNA fragments derived from different chromosomal regions have repeats of the same sequence, they will appear to overlap
b) When DNA fragments derived from different chromosomal regions have repeats of the same sequence, they will appear to scrutinize
c) In a new whole shotgun approach, Celera Genomics is sequencing both ends of DNA fragments of short (2 kb), medium (10 kb), and long (BAC or >100 kb) lengths
d) A large number of reads are then assembled by computer
Answer: b [Reason:] A controversy has arisen as to whether or not the above shotgun sequencing strategy can be applied to genomes with repetitive sequences such as those likely to be encountered in sequencing the human genome. This method has been used to assemble the genome of the fruit fly Drosophila melanogaster after removal of the most highly repetitive regions and also to assemble a significant proportion of the human genome.
1. Which of the following is not a software for dot plot analysis?
Answer: a [Reason:] For the purpose of dot plot interpretation there are various softwares currently present. Among these SIM is used for these kinds of alignments through dot-plot method that is wrongly abbreviated.
2. The softwares for dot plot analysis perform several tasks. Which one of them is not performed by them?
a) Gap open penalty
b) Gap extend penalty
c) Expectation threshold
d) Change or mutate residues
Answer: d [Reason:] The gap penalties mentioned above are for the determination of score of the aligning sequences. The change in residue barely takes place as there are number of other softwares for that purpose and also the main objective is to find the score of the alignment.
3. For palindromic sequences, what is the structure of the dot plot?
a) 2 intersecting diagonal lines at the midpoint
b) One diagonal
c) Two parallel diagonals
d) No diagonal
Answer: a [Reason:] For perfectly aligned sequences there is a diagonal formation of dot plot. For palindromic sequences i. e. for sequences that are symmetrical from the midpoint of the sequence, there exist 2 intersecting diagonals on the plot.
4. For significantly aligning sequences what is the resulting structure on the plot?
a) Intercrossing lines
b) Crosses everywhere
c) Vertical lines
d) A diagonal and lines parallel to diagonal
Answer: d [Reason:] If there is alignment of sequences there is a significantly bold diagonal visible on the plot. And if the is a bit imperfect, the diagonal is shattered too to an extent and forms small parallel lines to it.
5. When was this method, first described?
Answer: c [Reason:] This method was first described in 1970. Briefly, this method involves constructing a matrix with one of the sequences to be compared running horizontally across the bottom, and the other running vertically along the left-hand side.
6. Who were the inventors of this method?
b) Margaret Preston
c) Gibbs and McIntyre
Answer: c [Reason:] The first computer aided sequence comparison is called “dot-matrix analysis” or simply dot-plot. The first published account of this method is by Gibbs and McIntyre (1970 The diagram, a method for comparing sequences. Eur. J. Biochem 16: 1-11).
7. Which of the following is true for EMBOSS Dottup?
a) Allows you to specify threshold
b) Doesn’t allow you to specify threshold
c) Doesn’t allow you to specify window size
d) If all cells in the window are identity, it colors in some specific cells in the window
Answer: b [Reason:] The EMBOSS Dottup doesn’t allow you to specify threshold but allows you to specify window size. Also, if all cells in the window are identity, it colors in all the cells in the window.
8. Isolated dots that are not on the diagonal represent exact matches.
Answer: b [Reason:] Those isolated dots represent random matches. The dots on the diagonal represent the perfect alignment and the dots with vertical and horizontal shifts show insertions and deletions.
9. Vertical frame shifts show ______ while the horizontal ones show _______
a) insertion, insertion
b) insertion, deletion
c) deletion, deletion
d) deletion, insertion
Answer: b [Reason:] Deletion and insertion of nucleotides is quite common in alignment process. The dot plot easily represents them with vertical and horizontal shifts. And the mutations are totally out of the diagonal zone.
10. Dot plot of repeating elements would be small crosses on plot.
Answer: False [Reason:] The repeating elements would be represented in parallel lines in repetitive manner. Better is the repetition; better is the nature of parallel lines. Also, the intersections show the pallindromic sequences.
1. Use of the dynamic programming method requires a scoring system for the comparison of symbol pairs, and a scheme for GAP penalties.
Answer: a [Reason:] Once those parameters have been set, the resulting alignment for two sequences should always be the same. Hence, the use of the dynamic programming method requires a scoring system for the comparison of symbol pairs (nucleotides for DNA sequences and amino acids for protein sequences), and a scheme for insertion/deletion (GAP) penalties.
2. After the derivation, the outputs of the dynamic programming are the ratios are called even scores.
Answer: b [Reason:] After the derivation, the outputs of the dynamic programming are the ratios are called odd scores. The ratios are transformed to logarithms of odds scores, called log odds scores, so that scores of sequential pairs may be added to reflect the overall odds of a real to chance alignment of an alignment. This happens in Dayhoff PAM250 and BLOSUM62.
3. The matrices PAM250 and BLOSUM62 contain _______
a) positive and negative values
b) positive values only
c) negative values only
d) neither positive nor negative values, just the percentage
Answer: a [Reason:] These matrices contain positive and negative values, reflecting the likelihood of each amino acid substitution in related proteins. Using these tables, an alignment of a sequential set of amino acid pairs with no gaps receives an overall score that is the sum of the positive and negative log odds scores for each individual amino acid pair in the alignment.
4. The higher is the score in the alignment, _________
a) the more significant is the alignment
b) or the less it resembles alignments in related proteins
c) the less significant is the alignment
d) the less it aligns with the related protein sequence
Answer: a [Reason:] In the scoring system, the higher this score, the more significant is the alignment, or the more it resembles alignments in related proteins. Also, the score given for gaps in aligned sequences is negative, because such misaligned regions should be uncommon in sequences of related proteins. Such a score will reduce the score obtained from an adjacent, matching region upstream in the sequences.
5. Gaps are added to the alignment because it ______
a) increases the matching of identical amino acids at subsequent portions in the alignment
b) increases the matching of or dissimilar amino acids at subsequent portions in the alignment
c) reduces the overall score
d) enhances the area of the sequences
Answer: a [Reason:] In alignment process, gaps are added to the alignment in a manner that increases the matching of identical or similar amino acids at subsequent portions in the alignment. Ideally, when two similar protein sequences are aligned, the alignment should have long regions of identical or related amino acid pairs and very few gaps. As the sequences become more distant, more mismatched amino acid pairs and gaps should appear.
6. Which of the following is not a description of dynamic programming algorithm?
a) A method of sequence alignment
b) A method that can take gaps into account
c) A method that requires a manageable number of comparisons
d) This method doesn’t provide an optimal (highest scoring) alignment
Answer: d [Reason:] The method of sequence alignment by dynamic programming provides an optimal (highest scoring) alignment as an output. The quality of the alignment between two sequences is calculated using a scoring system that favors the matching of related or identical amino acids and penalizes for poorly matched amino acids and gaps.
7. Which of the following is not a site on internet for alignment of sequence pairs?
d) BCM Search Launcher
Answer: a [Reason:] BLASTP is used under BLAST 2 sequence alignment. Also, The BLAST algorithm normally used for database similarity searches can also be used to align two sequences. SIM is known as Local similarity program for finding alternative alignments.
8. Dayhoff PAM matrices, are based on an evolutionary model of protein change, whereas, BLOSUM matrices, are designed to identify members of the same family.
Answer: a [Reason:] There are a very large number of amino acid scoring matrices in use, some much more popular than others, and these scoring matrices are designed for different purposes. Some, such as the Dayhoff PAM matrices, are based on an evolutionary model of protein change, whereas others, such as the BLOSUM matrices, are designed to identify members of the same family. Alignments between DNA sequences require similar kinds of considerations.
9. A feature of the dynamic programming algorithm is that the alignments obtained depend on the choice of a scoring system for comparing character pairs and penalty scores for gaps.
Answer: a [Reason:] For an algorithm, the output depends on the choice of a scoring system. For protein sequences, the simplest system of comparison is one based on identity. A match in an alignment is only scored if the two aligned amino acids are identical. However, one can also examine related protein sequences that can be aligned easily and find which amino acids are commonly substituted for each other.
10. Which of the following is untrue regarding dynamic programming algorithm?
a) The method compares every pair of characters in the two sequences and generates an alignment
b) The output alignment will include matched and mismatched characters and gaps in the two sequences that are positioned so that the number of matches between identical or related characters is the maximum possible
c) The dynamic programming algorithm provides a reliable computational method for aligning DNA and protein sequences
d) This doesn’t allow making evolutionary predictions on the basis of sequence alignments
Answer: d [Reason:] Optimal alignments provide useful information to biologists concerning sequence relationships by giving the best possible information as to which characters in a sequence should be in the same column in an alignment, and which are insertions in one of the sequences (or deletions on the other). This information is important for making functional, structural, and evolutionary predictions on the basis of sequence alignments.