Generic selectors
Exact matches only
Search in title
Search in content
Search in posts
Search in pages
Filter by Categories
nmims post
Objective Type Set
Online MCQ Assignment
Question Solution
Solved Question
Uncategorized

Multiple choice question for engineering

Set 1

1.The BLAST program was developed in _______
a) 1992
b) 1995
c) 1990
d) 1991

View Answer

Answer: c [Reason:] The BLAST program was developed by Stephen Altschul of NCBI in 1990 and hassince become one of the most popular programs for sequence analysis. BLAST uses heuristics to align a query sequence with all sequences in a database.

2. In sequence alignment by BLAST, each word from query sequence is typically _______ residues for protein sequences and _______ residues for DNA sequences.
a) ten, eleven
b) three, three
c) three, eleven
d) three, ten

View Answer

Answer: c [Reason:] The first step is to create a list of words from the query sequence. Each word is typically three residuesfor protein sequences and eleven residues for DNA sequences. The list includes every possible word extracted from the query sequence. This step is also called seeding.

3. In sequence alignment by BLAST, the second step is to search a sequence database for the occurrence of these words.
a) True
b) False

View Answer

Answer: a [Reason:] This step is to identify database sequences containing the matching words. The matchingof the words is scored by a given substitution matrix. A word is considered a match if it is above a threshold.

4. The final step involves pairwise alignment by extending from the words in both directions while counting the alignment score using the same substitution matrix.
a) True
b) False

View Answer

Answer: a [Reason:] The extension continues until the score of the alignment drops below a threshold due to mismatches (the drop threshold is twenty-two for proteinsand twenty for DNA). The resulting contiguous aligned segment pair without gaps is called high-scoring segment pair. In the originalversion of BLAST, the highest scored HSPs are presented as the final report. They arealso called maximum scoring pairs.

5. A recent improvement in the implementation of BLAST is the ability to provide gapped alignment.
a) True
b) False

View Answer

Answer: a [Reason:] In gapped BLAST, the highest scored segment is chosen to be extended in both directions using dynamic programming where gaps may be introduced.The extension continues if the alignment score is above a certain threshold; otherwise it is terminated. However, the overall score is allowed to drop below the threshold only if it is temporary and rises again to attain above threshold values. Final trimming of terminal regions is needed before producing a report of the final alignment.

6. Which of the following is not a variant of BLAST?
a) BLASTN
b) BLASTP
c) BLASTX
d) TBLASTNX

View Answer

Answer: d [Reason:] BLAST is a family of programs that includes BLASTN, BLASTP, BLASTX TBLASTN, and TBLASTX. BLASTN queries nucleotide sequences with a nucleotide sequence database. The alignment scoring is based on the BLOSUM62 matrix.

7. BLASTX uses protein sequences as queries to search against a protein sequence database.
a) True
b) False

View Answer

Answer: b [Reason:] BLASTP, and not BLASTX, uses protein sequences as queries to search against a protein sequence database. BLASTX uses nucleotide sequences as queries and translates them in all six reading frames to produce translated protein sequences, which are used to query a protein sequence database.

8. TBLASTX queries protein sequences to anucleotide sequence database with the sequences translated in all six reading frames.
a) True
b) False

View Answer

Answer: b [Reason:] TBLASTN queries protein sequences to anucleotide sequence database with the sequences translated in all six reading frames. TBLASTX uses nucleotide sequences, which are translated in all six frames, to search against a nucleotide sequence database that has all the sequences translated in sixframes. In addition, there is also a bl2seq program that performs local alignment oftwo user-provided input sequences. The graphical output includes horizontal barsand a diagonal in a two-dimensional diagram showing the overall extent of matching between the two sequences.

9. Which of the following is not a correct about BLAST?
a) The BLAST web server has been designed in suchaway as to simplify the task of program selection.
b) The programs are organized based onthe type of query sequences
c) The programs are organized based onthe type of nucleotide sequences, or nucleotidesequence to be translated
d) BLAST is not based on heuristic searching methods

View Answer

Answer: d [Reason:] BLAST and FASTA are based on heuristic searching methods. In addition, programs for special purposes are grouped separately; for example, bl2seq, immunoglobulin BLAST, and VecScreen, a program for removing contaminating vector sequences.

10. If one is looking for protein homologs encoded in newly sequenced genomes, one may use TBLASTN, which translates nucleotide database sequences in all six open reading frames.
a) True
b) False

View Answer

Answer: a [Reason:] This may help to identify protein coding genes that have not yet been annotated. If a DNA sequence is to be used as the query, a protein-level comparison can be done with TBLASTX. However, both programs are very computationally intensive and the search process can be very slow.

Set 2

1. By whom and when were the Bayesian methods applied first?
a) Smith-Waterman, 1981
b) Agarwal and States, 1996
c) Smith-Waterman, 1996
d) Agarwal and States, 1981

View Answer

Answer: b [Reason:] Agarwal and States, in1996, have applied Bayesian methods to provide the best estimate of the evolutionary distance between two DNA sequences. For example, sequences of the same length that have a certain level of mismatches.

2. With the application of Bayesian methods, the most probable repeat length and evolutionary time since the repeat was formed may be derived.
a) True
b) False

View Answer

Answer: a [Reason:] Sequences of this type originated from gene duplication events in the yeast and Caenorhabditis elegans genomes. When there are multiple mismatches between such repeated sequences, it is difficult to determine the most likely length of the repeats. Here the methods can be used.

3. If the purpose is to calculate the probability of one event AND a second event, the odds scores for the events are _________
a) added
b) multiplied
c) multiplied and added
d) subtracted

View Answer

Answer: b [Reason:] An example is the calculation of the odds of an alignment of two sequences from the alignment scores for each of the matched pairs of bases or amino acids in the alignment. The odds scores for the pairs are multiplied. Usually, the log odds score for the first pair is added to that for the second, etc., until the scores for every pair have been added.

4. In a type of probability, analysis is to calculate the odds score for one event OR a second event, or of a series of events. In this case, the odds scores are _______
a) multiplied
b) subtracted
c) added and multiplied
d) added

View Answer

Answer: d [Reason:] An example is the calculation of the odds score for a given sequence alignment using a series of alternative PAM scoring matrices. The alignment scores are calculated in log odds units and then converted into odds scores.

5. In Bayesian methods, difficulty with making estimations is that the estimate depends on the
Assumption– The mutation rate in sequences has been constant with time and that the rate of mutation of all nucleotides is the same.
a) True
b) False

View Answer

Answer: a [Reason:] The assumption mentioned above (the molecular clock hypothesis) is made to reduce the complications. Such problems may be solved by scoring different portions of a sequence with a different scoring matrix, and then using the above Bayesian methods to calculate the best evolutionary distance.

6. Another difficulty in Bayesian methods is deciding on the length of sequence that was duplicated
a) True
b) False

View Answer

Answer: a [Reason:] In genomes, the presence of repeats may be revealed by long regions of matched sequence positions dispersed among regions of sequence positions that do not match. However, as the frequency of mismatches is increased, it becomes difficult to determine the extent of the repeated region.

7. A length and distance that gives the highest overall probability may then be determined. Such alignments are initially found using ________
a) a particular scoring matrix only
b) an alignment algorithm only
c) an alignment algorithm and a particular scoring matrix
d) dot method

View Answer

Answer: c [Reason:] Analysis of the yeast and C. elegans genomes for such repeats has underscored the importance of using a range of DNA scoring matrices such as PAM1 to PAM120 if most repeats are to be found. The application of the above Bayesian analysis allows a determination of the probability distributions as a function of both length of the repeated region and evolutionary distance.

8. Which of the following feature of Bayesian methods is the disadvantage of it?
a) A length and distance that gives the highest overall probability may be determined
b) They are used to calculate evolutionary distance
c) Computationally Bayesian methods are better
d) A specific mutational model is required

View Answer

Answer: d [Reason:] One disadvantage of the Bayesian approach is that a specific mutational model is required, whereas other methods, such as the maximum likelihood approach, can be used to estimate the best mutational model as well as the distance. Computationally, however, the Bayesian method is much more practical.

9. Zhu (1998) have devised a computer program called the Bayes block aligner which in effect slides ____ sequences along each other to find the ______ ungapped regions or blocks.
a) two, least scoring
b) two, highest scoring
c) multiple, highest scoring
d) multiple, least scoring

View Answer

Answer: b [Reason:] These blocks are then joined in various combinations to produce alignments. There is no need for gap penalties because only the aligned sequence positions in blocks are scored. Instead of using a given substitution matrix and gap scoring system to find the highest scoring alignment, a Bayesian statistical approach is used.

10. Unlike the commonly used methods for aligning a pair of sequences, the Bayesian method _______ using a particular scoring matrix or designated gap penalties.
a) does not depend on
b) depends on
c) is based on
d) involves

View Answer

Answer: a [Reason:] Because it doesn’t depend on the mentioned techniques, there is no need to choose a particular scoring system or gap penalty. Instead, a number of different scoring matrices and range of block numbers up to some reasonable maximum are examined, and the most probable alignments are determined. The Bayesian method provides a distribution of alignments weighted according to probability and can also provide an estimate of the evolutionary distance between the sequences that is independent of scoring matrix and gaps.

Set 3

1. Which of the following is true regarding the methods of gene prediction?
a) They solely consist of a type called ab initio–based methods
b) The ab initio–based approach predicts genes based on the given sequence alone
c) The ab initio–based approach predicts genes based on the given sequence and relative homology data
d) They solely consist of a type called homology-based approaches

View Answer

Answer: b [Reason:] The current gene prediction methods can be classified into two major categories, ab initio–based and homology-based approaches. The ab initio–based approach predicts genes based on the given sequence alone.

2. In the ab initio–based approaches—they rely on two major features associated with genes: one of them being the existence of gene signals, which include start and stop codons, intron splice signals, transcription factor binding sites etc
a) True
b) False

View Answer

Answer: a [Reason:] They also include ribosomal binding sites, and polyadenylation (poly-A) sites. In addition, the triplet codon structure limits the coding frame length to multiples of three, which can be used as a condition for gene prediction.

3. In the ab initio–based approaches—they rely on two major features associated with genes: one of them being gene content, which is statistical description of coding regions.
a) True
b) False

View Answer

Answer: a [Reason:] It has been observed that nucleotide composition and statistical patterns of the coding regions tend to vary significantly from those of the non-coding regions. The unique features can be detected by employing probabilistic models such as Markov models or hidden Markov models to help distinguish coding from non-coding regions.

4. The homology-based method makes predictions based on significant matches of the query sequence with sequences of known genes.
a) True
b) False

View Answer

Answer: a [Reason:] For instance, if a translated DNA sequence is found to be similar to a known protein or protein family from a database search, this can be strong evidence that the region codes for a protein. Alternatively, when possible exons of a genomic DNA region match a sequenced cDNA, this also provides experimental evidence for the existence of a coding region.

5. FGENESB is a web-based program that is also based on fifth-order HMMs for detecting coding regions.
a) True
b) False

View Answer

Answer: a [Reason:] The program is specifically trained for bacterial sequences. It uses the Vertibi algorithm to find an optimal match for the query sequence with the intrinsic model. A linear discriminant analysis (LDA) is used to further distinguish coding signals from non-coding signals.

6. Which of the following is untrue about GeneMark?
a) It is a suite of gene prediction programs based on the fifth-order HMMs
b) The main program is trained on a number of complete microbial genomes
c) A GeneMark heuristic program can be used to improve accuracy
d) If the sequence to be predicted is from a non-listed organism, the most closely related organism can be chosen as the basis for computation

View Answer

Answer: c [Reason:] Another option for predicting genes from a new organism is to use a self-trained program GeneMarkS as long as the user can provide at least 100 kbp of sequence on which to train the model. If the query sequence is shorter than 100 kbp, a GeneMark heuristic program can be used with some loss of accuracy. In addition to predicting prokaryotic genes, GeneMark also has a variant for eukaryotic gene prediction using HMM.

7. Which of the following is untrue about Glimmer?
a) It stands for Gene Locator and Interpolated Markov Modeler
b) It is a UNIX program from TIGR
c) It does not necessarily use the IMM algorithm
d) It is used to predict potential coding regions

View Answer

Answer: c [Reason:] The computation consists of two steps, namely model building and gene prediction. The model building involves training by the input sequence, which optimizes the parameters of the model. In an actual gene prediction, the overlapping frames are “flagged” to alert the user for further inspection. Glimmer also has a variant, GlimmerM, for eukaryotic gene prediction.

8. RBS finder is a UNIX program that uses the prediction output from Glimmer and searches for the Shine–Delgarno sequences in the vicinity of predicted start sites.
a) True
b) False

View Answer

Answer: a [Reason:] A high-scoring site is found by the intrinsic probabilistic model, a start codon is confirmed. Otherwise the program moves to other putative translation start sites and repeats the process.

Set 4

1. Which of the following is incorrect statement about Character-based methods?
a) They are also called discrete methods
b) They are based directly on the sequence characters rather than on pairwise distances
c) They doesn’t count mutational events accumulated on the sequences
d) They may avoid the loss of information when characters are converted to distances

View Answer

Answer: c [Reason:] They count mutational events accumulated on the sequences. This preservation of character information means that evolutionary dynamics of each character can be studied. Ancestral sequences can also be inferred. The two most popular character-based approaches are the maximum parsimony (MP) and maximum likelihood (ML) methods.

2. Which of the following is incorrect statement about Maximum Parsimony Method?
a) By cutting off the unnecessary variables, model development may become difficult, and there may be more chances of introducing inconsistencies, ambiguities, and redundancies, hence, the name Occam’s razor
b) In dealing with problems that may have an infinite number of possible solutions, choosing the simplest model may help to ‘cut off’ those variables that are not really necessary to explain the phenomenon
c) This method chooses a tree that has the fewest evolutionary changes or shortest overall branch lengths
d) It is based on a principle related to a medieval philosophy called Occam’s razor

View Answer

Answer: a [Reason:] The theory was formulated by William of Occam in the thirteenth century and states that the simplest explanation is probably the correct one. This is because the simplest explanation requires the fewest assumptions and the fewest leaps of logic.

3. Which of the following is incorrect statement about Building Work of MP tree?
a) It works by searching for all possible tree topologies and reconstructing ancestral sequences that require the minimum number of changes to evolve to the current sequences
b) Other than informative sites are non-informative, which are constant sites or sites that have changes occurring only once
c) Informative sites are the ones that can often be explained by a unique tree topology
d) Constant sites have the same state in all taxa and are quite useful in evaluating the various topologies

View Answer

Answer: d [Reason:] Constant sites have the same state in all taxa and are obviously useless in evaluating the various topologies. The sites that have changes occurring only once are not very useful either for constructing parsimony trees because they can be explained by multiple tree topologies. The non-informative sites are thus discarded in parsimony tree construction.

4. Because these ancestral character states are not known directly, multiple possible solutions may exist. In this case, the parsimony principle applies to choose the character states that result in a minimum number of substitutions.
a) True
b) False

View Answer

Answer: a [Reason:] The inference of an ancestral sequence is made by first going from the leaves to internal nodes and to the common root to determine all possible ancestral character states. Then it goes back from the common root to the leaves to assign ancestral sequences that require the minimum number of substitutions.

5. The unweighted method treats all mutations as equivalent.
a) True
b) False

View Answer

Answer: a [Reason:] This may be an oversimplification; mutations of some sites are known to occur less frequently than others, for example, transversions versus transitions, functionally important sites versus neutral sites. Therefore, a weighting scheme that takes into account the different kinds of mutations helps to select tree topologies more accurately. The MP method that incorporates a weighting scheme is called weighted parsimony.

6. Which of the following is incorrect statement about Tree-Searching Methods?
a) The choice of the first three taxa can be random
b) Parsimony method examines all possible tree topologies to find the maximally parsimonious tree.
c) It starts by building a three taxa unrooted tree, for which only one topology is available
d) This is different than exhaustive search method

View Answer

Answer: d [Reason:] This is an exhaustive search method. The next step is to add a fourth taxon to the existing branches, producing three possible topologies. The remaining taxa are progressively added to form all possible tree topologies .Obviously, this brute-force approach only works if there are relatively few sequences.

7. Which of the following is incorrect statement about branch-and-bound?
a) It uses a shortcut to find an MP tree
b) It establishes an upper limit (or upper bound) for the number of allowed sequence variations
c) It solely uses UPGMA method
d) It starts by building a distance tree for all taxa involved

View Answer

Answer: c [Reason:] It starts by building a distance tree for all taxa involved using either NJ or UPGMA and then computing the minimum number of substitutions for this tree. The resulting number defines the upper bound to which any other trees are compared. The rationale is that a maximally parsimonious tree must be equal to or shorter than the distance-based tree.

8. The branch-and-bound method starts building trees in a similar way as in the exhaustive method.
a) True
b) False

View Answer

Answer: a [Reason:] The difference is that the previously established upper bound limits the tree growth. Whenever the overall tree length at every single stage exceeds the upper bound, the topology search toward a particular direction aborts. By doing so, it dramatically reduces the number of trees considered hence the computing time while at the same time guaranteeing to find the most parsimonious tree.

9. When the number of taxa exceeds twenty, even the branch-and-bound method becomes computationally unfeasible.
a) True
b) False

View Answer

Answer: a [Reason:] A more heuristic search method must be used. A computer heuristic procedure is an approximation strategy to find an empirical solution for a complicated problem. This strategy generates quick answers, but not necessarily the best answer.

10. In a heuristic tree search, only a small subset of all possible trees is examined.
a) True
b) False

View Answer

Answer: a [Reason:] This method starts by carrying out a quick initial approximation, which is to build an NJ tree and subsequently modifying it slightly into a different topology to see whether that leads to a shorter tree.

Set 5

1. Coiled coils are super helical structures involving two to more interacting α-helices from the same or different proteins.
a) True
b) False

View Answer

Answer: a [Reason:] The individual α-helices twist and wind around each other to form a coiled bundle structure. The coiled coil conformation is important in facilitating inter- or intra protein interactions. Proteins possessing these structural domains are often involved in transcription regulation or in the maintenance of cytoskeletal integrity.

2. Which of the following is true regarding Coiled coil?
a) They have an integral repeat of twenty residues
b) They have an integral repeat of seven residues
c) They have an integral repeat of thirty residues
d) The sequence periodicity doesn’t contribute in designing algorithms to predict the structural domain.

View Answer

Answer: b [Reason:] Coiled coils have an integral repeat of seven residues (heptads) which assume a side-chain packing geometry at facing residues. For every seven residues, the first and fourth are hydrophobic, facing the helical interface; the others are hydrophilic and exposed to the solvent. The sequence periodicity forms the basis for designing algorithms to predict this important structural domain.

3. Which of the following is untrue regarding Coils?
a) It is a web-based program that detects coiled coil regions in proteins
b) It scans a window of fourteen, twenty-one, or twenty-eight residues
c) It scans a window of fourteen or twenty-one residues only
d) It compares the sequence to a probability matrix compiled from known parallel two-stranded coiled coils.

View Answer

Answer: c [Reason:] By comparing the similarity scores, the program calculates the probability of the sequence to adopt a coiled coil conformation. The program is accurate for solvent-exposed, left-handed coiled coils, but less sensitive for other types of coiled coil structures, such as buried or right-handed coiled coils.

4. In Multicoil, The scoring matrix is constructed based on a database of known three-stranded coiled coils only.
a) True
b) False

View Answer

Answer: b [Reason:] Multicoil is a web-based program for predicting coiled coils. The scoring matrix is constructed based on a database of known two-stranded and three-stranded coiled coils. The program is more conservative than Coils. It has been recently used in several genome-wide studies to screen for protein–protein interactions mediated by coiled coil domains.

5. Leucine zipper domains are a special type of coiled coils found in transcription regulatory proteins which contain two anti parallel α-helices held together by hydrophobic interactions of leucine residues.
a) True
b) False

View Answer

Answer: a [Reason:] The heptad repeat pattern is L-X(6)-L-X(6)-L–X(6)-L. This repeat pattern alone can sometimes allow the domain detection, albeit with high rates of false positives. The reason for the high false-positive rates is that the condition of the sequence region being a coiled coil conformation is not satisfied. To address this problem, algorithms have been developed that take into account both leucine repeats and coiled coil conformation to give accurate prediction.

6. Which of the following is untrue regarding PSIPRED?
a) It is a web-based program that predicts protein secondary structures
b) It uses a combination of evolutionary information and neural networks
c) It uses a combination of evolutionary information only
d) The multiple sequence alignment is derived from a PSI-BLAST database search

View Answer

Answer: c [Reason:] A profile is extracted from the multiple sequence alignment generated from three rounds of automated PSI-BLAST. The profile is then used as input for a neural network prediction similar to that in PHD, but without the jury layer. To achieve higher accuracy, a unique filtering algorithm is implemented to filter out unrelated PSI-BLAST hits during profile construction.

7. Prof is not similar to PHD.
a) True
b) False

View Answer

Answer: b [Reason:] Prof stands for Protein forecasting. It is an algorithm that combines PSI-BLAST profiles and a multistaged neural network, similar to that in PHD. In addition, it uses a linear discriminant function to discriminate between the three states.

8. Jpred combines the analysis results from six prediction algorithms, including PHD, PREDATOR, DSC, NNSSP, Jnet, and ZPred.
a) True
b) False

View Answer

Answer: a [Reason:] The query sequence is first used to search databases with PSI-BLAST for three iterations. Redundant sequence hits are removed. The resulting sequence homologs are used to build a multiple alignment from which a profile is extracted. The profile information is submitted to the six prediction programs. If there is sufficient agreement among the prediction programs, the majority of the prediction is taken as the structure.