Multiple choice question for engineering
1. Which of the following is wrong about GenBank DNA Sequence Entry?
a) The information is organized into fields, each with an identifier, shown as the first text on each line
b) In some entries, these identifiers may be abbreviated to two letters, e.g., RF for reference
c) Some identifiers may have additional subfields
d) The CDS subfield in the field FEATURES does not offer the amino acid sequence
Answer: d [Reason:] The CDS subfield in the field FEATURES gives the amino acid sequence obtained by translation of known and potential open reading frames. The format of a database entry in GenBank, the NCBI nucleic acid and protein sequence database, is as follows: Information describing each sequence entry is given, including literature references, information about the function of the sequence, locations of mRNAs and coding regions, and positions of important mutations.
2. A consecutive set of three-letter words that could be codons specifying the amino acid sequence of a protein. The sequence entry is assumed by computer programs to lie between the identifiers “ORIGIN” and “//”.
Answer: a [Reason:] The sequence includes numbers on each line so that sequence positions can be located by eye. Because the sequence count or a sequence checksum value may be used by the computer program to verify the sequence composition, the sequence count should not be modified except by programs that also modify the count. The GenBank sequence format often has to be changed for use with sequence analysis software.
3. In Organization of the GenBank database and the search procedure used by ENTREZ—each row is another sequence entry and each column another GenBank field.
Answer: a [Reason:] When one sequence entry is retrieved, all of these fields will be displayed. Search for the term “SOS regulon and coli” in all fields will find two matching sequences. Finding these sequences is simple because indexes have been made listing all of the sequences that have any given term, one index for each field. Similarly, a search for transcriptional regulator will find three sequences.
4. Which of the following is wrong about European Molecular Biology Laboratory Data Library Format?
a) EMBL maintains DNA and protein sequence databases
b) As with GenBank entries, a large amount of information describing each sequence entry is given
c) Sequence entry includes literature references and information about the function of the sequence, but not locations of mRNAs and coding regions
d) Information is organized into fields, each with an identifier, shown as the first text on each line
Answer: c [Reason:] Sequence entry includes literature references and information about the function of the sequence, locations of mRNAs and coding regions and positions of important mutations. The sequence count or a checksum value for the sequence may be used by computer programs to make sure that the sequence is complete and accurate. For this reason, the sequence part of the entry should usually not be modified except with programs that also modify this count.
5. The format of an entry in the SwissProt protein sequence database is very similar to the EMBL format.
Answer: a [Reason:] The format is quite similar to the EMBL format, except that considerably more information about the physical and biochemical properties of the protein is provided. Also, the output of a DDBJ DNA sequence entry is almost identical to that of GenBank.
6. Which of the following is wrong about FASTA Sequence Format?
a) The FASTA sequence format includes a comment line identified by a “>” character in the first column followed by the name and origin of the sequence
b) The FASTA sequence format includes the sequence in standard one-letter symbols
c) This format provides a very convenient way to copy just the sequence part from one window to another because there are no numbers or other nonsequence characters within the sequence
d) The presence of ‘*’ is not quite essential for reading the sequence correctly by some sequence analysis programs
Answer: d [Reason:] The FASTA sequence format includes an optional ‘*’ which indicates end of sequence and which may or may not be present and its presence maybe essential. The FASTA sequence format is similar to the protein information resource (NBRF) format except that the NBRF format includes a first line with a “>” character in the first column followed by information about the sequence, a second line containing an identification name for the sequence, and the third to last lines containing the sequence.
7. Which of the following is wrong about National Biomedical Research Foundation/Protein Information Resource Sequence Format?
a) Sequences retrieved from the PIR database are not in this compact format, but in an expanded format with much more information about the sequence
b) The NBRF format is similar to the FASTA sequence format but with significant differences
c) This is different than PIR format
d) The first line includes an initial “>” character followed by a two-letter code such as P for complete sequence or F for fragment, followed by a 1 or 2 to indicate type of sequence, then a semicolon, then a four- to six-character unique name for the entry
Answer: c [Reason:] This sequence format, which is sometimes also called the PIR format. It has been used by the National Biomedical Research Foundation/Protein Information Resource (NBRF) and also by other sequence analysis programs.
8. In Stanford University/Intelligenetics Sequence Format– At the end of the sequence, a 1 is placed if the sequence is linear, and a 2 if the sequence is circular.
Answer: a [Reason:] It is started by a molecular genetics group at Stanford University, and subsequently continued by a company, Intelligenetics, the IG format is similar to the PIR format, except that a semicolon is usually placed before the comment line. The identifier on the second line is also present.
9. Which of the following is wrong about Genetics Computer Group Sequence Format?
a) Earlier versions of the Genetics Computer Group (GCG) programs require a unique sequence format and include programs that convert other sequence formats into GCG format
b) Information about the sequence in the GenBank entry is not included but the line information is carried out
c) If one or more sequence characters become changed through error, a program reading the sequence will be able to determine that the change has occurred because the checksum value in the sequence entry will no longer be correct
d) Lines of information are terminated by two periods, which mark the end of information and the start of the sequence on the next line
Answer: b [Reason:] Information about the sequence in the GenBank entry is first included, followed by a line of information about the sequence and a checksum value. This value (not shown) is provided as a check on the accuracy of the sequence by the addition of the ASCII values of the sequence. If the sequence has not been changed, this value should stay the same.
10. Which of the following is wrong about Abstract Syntax Notation Sequence Format?
a) The information is much more difficult to read by eye than a GenBank formatted sequence
b) Abstract Syntax Notation (ASN.1) is a formal data description language that has been developed by the computer industry
c) All the information found in other forms of sequence storage, e.g., the GenBank format, is present. For example, sequences can be retrieved in this format by ENTREZ
d) Taxonomic information and bibliographic information cannot be encoded with this format
Answer: d [Reason:] ASN.1 has been adopted by the National Center for Biotechnology Information (NCBI) to encode data such as sequences, maps, taxonomic information, molecular structures, and bibliographic information. These data sets may then be easily connected and accessed by computers. The ASN.1 sequence format is a highly structured and detailed format especially designed for computer access to the data.
11. Which of the given statements is in correct?
a) Before using a sequence file in a sequence analysis program, it is important to ensure that computer sequence files contain only sequence characters and not special characters used by text editors
b) Computer sequence files might contain special characters used by text editors
c) Editing a sequence file with a word processor can introduce such changes if one is not careful to work only with text or so-called ASCII files
d) Most text editors normally create text files that include control characters in addition to standard ASCII characters
Answer: b [Reason:] As option a and b contradict, option a being right, one should check for special characters. The control characters will only be recognized correctly by the text editor program. Sequence files that contain such control characters may not be analyzed correctly, depending on whether or not the sequence analysis program filters them out. Editors usually provide a way to save files with only standard ASCII characters, and these files will be suitable for most sequence analysis programs.
12. Which of the given statements is in correct about ASCII and Hexadecimal?
a) Computers store sequence information as simple rows of sequence characters called strings, which are similar to the sequences shown on the computer terminal
b) Each character is stored in binary code in the smallest unit of memory, called a byte
c) Each character is stored in binary code in the smallest unit of memory, called a bit
d) By convention, many of these combinations have a specific definition, called their ASCII equivalent
Answer: b [Reason:] Each byte comprises 8 bits, with each bit having a possible value of 0 or 1, producing 255 possible combinations. Some ASCII values are defined as keyboard characters, others as special control characters, such as signaling the end of a line (a line feed and a carriage return), or the end of a file full of text (end-of-file character). A file with only ASCII characters is called an ASCII file.
13. Which of the given statements is untrue?
a) Sequence and other data files that contain non-ASCII characters also may not be transferred correctly from one machine to another and may cause unpredictable behavior of the communications software
b) The ASCII mode is useful for transferring text files, and the binary mode is useful for transferring compressed data files, which also contain non-ASCII characters
c) ASCII and binary modes cannot be set by the user
d) Most sequence analysis programs also require not only that a DNA or protein sequence file be a standard ASCII file, but also that the file be in a particular format such as the FASTA format
Answer: b [Reason:] The file transfer program (FTP) has ASCII and binary modes, which may be set by the user. Some communications software can be set to ignore such control character. The use of windows on a computer has simplified such problems, since one merely has to copy a sequence from one window, for example, a window that is running a Web browser on the ENTREZ Web site, and paste it into another, for example, that of a translation program.
14. According to standard amino acid code letters which of the given pair is not right?
a) K- lysine
b) Y- tyrosine
c) Q- glutamine
d) R- serine
Answer: d [Reason:] In addition to the standard four base symbols, A, T, G, and C, the Nomenclature Committee of the International Union of Biochemistry has established a standard code to represent bases in a nucleic acid sequence that is uncertain or ambiguous. R is represented by arginine.
15. For computer analysis of proteins, it is more convenient to use single-letter than three letter amino acid codes.
Answer: a [Reason:] For example, GenBank DNA sequence entries contain a translated sequence in single-letter code. The standard, single-letter amino acid code was established by a joint international committee.
1. The Expectation Maximization algorithm has been used to identify conserved domains in unaligned proteins only.
Answer: b [Reason:] This algorithm has been used to identify both conserved domains in unaligned proteins and protein-binding sites in unaligned DNA sequences (Lawrence and Reilly 1990), including sites that may include gaps (Cardon and Stormo 1992). Given are a set of sequences that are expected to have a common sequence pattern and may not be easily recognizable by eye.
2. Which of the following is untrue regarding Expectation Maximization algorithm?
a) An initial guess is made as to the location and size of the site of interest in each of the sequences, and these parts of the sequence are aligned
b) The alignment provides an estimate of the base or amino acid composition of each column in the site
c) The column-by-column composition of the site already available is used to estimate the probability of finding the site at any position in each of the sequences
d) The row-by-column composition of the site already available is used to estimate the probability
Answer: d [Reason:] The EM algorithm then consists of two steps, which are repeated consecutively. In step 1, the expectation step, the column-by-column composition of the site already available is used to estimate the probability of finding the site at any position in each of the sequences. These probabilities are used in turn to provide new information as to the expected base or amino acid distribution for each column in the site.
3. Out of the two repeated steps in EM algorithm, the step 2 is ________
a) the maximization step
b) the minimization step
c) the optimization step
d) the normalization step
Answer: a [Reason:] In step 2, the maximization step, the new counts of bases or amino acids for each position in the site found in step 1 are substituted for the previous set. Step 1 is then repeated using these new counts. The cycle is repeated until the algorithm converges on a solution and does not change with further cycles. At that time, the best location of the site in each sequence and the best estimate of the residue composition of each column in the site will be available.
4. In EM algorithm, as an example, suppose that there are 10 DNA sequences having very little similarity with each other, each about 100 nucleotides long and thought to contain a binding site near the middle 20 residues, based on biochemical and genetic evidence. the following steps would be used by the EM algorithm to find the most probable location of the binding sites in each of the ______ sequences.
Answer: b [Reason:] When examining the EM program MEME, the size and number of binding sites, the location in each sequence, and whether or not the site is present in each sequence do not necessarily have to be known. For the present example, the following steps would be used by the EM algorithm to find the most probable location of the binding sites in each of the 10 sequences.
5. In the initial step of EM algorithm, the 20-residue-long binding motif patterns in each sequence are aligned as an initial guess of the motif.
Answer: a [Reason:] The base composition of each column in the aligned patterns is then determined. The composition of the flanking sequence on each side of the site provides the surrounding base or amino acid composition for comparison. Each sequence is assumed to be the same length and to be aligned by the ends.
6. In the intermediate steps of EM algorithm, the number of each base in each column is determined and then converted to fractions.
Answer: a [Reason:] For example, that there are four Gs in the first column of the 10 sequences, then the frequency of G in the first column of the site, fSG = 4/10 = 0.4. This procedure is repeated for each base and each column.
7. For the 10-residue DNA sequence example, there are _______ possible starting sites for a 20-residue-long site.
Answer: c [Reason:] For the 10-residue DNA sequence example, there are 100 – 20 +1 possible starting sites for a 20-residue-long site. Where the first one is at position 1 in the sequence ending one at 20 and the last beginning at position 81 and ending at 100 (there is not enough sequence for a 20-residue-long site beyond position 81).
8. An alternative method is to produce an odds scoring matrix calculated by dividing each base frequency by the background frequency of that base.
Answer: a [Reason:] In this method, the probability of each location is then found by multiplying the odds scores from each column. An even simpler method is to use log odds scores in the matrix. The column scores are then simply added. In this case, the log odds scores must be converted to odds scores before position probabilities are calculated.
9. Which of the following about MEME is untrue?
a) It is a Web resource for performing local MSAs (Multiple Sequence Alignment) by the above expectation maximization method is the program MEME
b) It stands for Multiple EM for Motif Elicitation
c) It was developed at developed at the University of California at San Diego Supercomputing Center
d) The Web page has multiple versions for searching blocks by an EM algorithm
Answer: d [Reason:] The Web page for two versions of MEME, ParaMEME, a Web program that searches for blocks by an EM algorithm (Described below), and a similar program MetaMEME (which searches for profiles using HMMs, described below).The Motif Alignment and Search Tool (MAST) for searching through databases for matches to motifs.
10. Which of the following about the Gibbs sampler is untrue?
a) It is a statistical method for finding motifs in sequences
b) It is dissimilar to the principle of the EM method
c) It searches for the statistically most probable motifs
d) It can find the optimal width and number of given motifs in each sequence
Answer: b [Reason:] It is another statistical method for finding motifs in sequences is the Gibbs sampler. The method is similar in principle to the EM method described above, but the algorithm is different. A combinatorial approach of the Gibbs sampler and MOTIF may be used to make blocks at the BLOCKS Web site.
1. The truly statistically significant sequence alignment will be able to provide evidence of homology between the sequences involved.
Answer: a [Reason:] When given a sequence alignment showing a certain degree of similarity, it is often important to determine whether the observed sequence alignment can occur by random chance or the alignment is indeed statistically sound. When a statistically significant sequence alignment is under consideration, it will be able to provide evidence of homology between the sequences involved.
2. By calculating alignment scores of a large number of ______ sequence pairs, a distribution model of the ______ sequence scores can be derived.
a) related, randomized
b) unrelated, randomized
c) unrelated, unrandomized
d) related, unrandomized
Answer: b [Reason:] Solving the statistical significance problem requires a statistical test of the alignment scores of two unrelated sequences of the same length. From the distribution, a statistical test can be performed based on the number of standard deviations from the average score.
3. Many studies have demonstrated that the distribution of similarity scores assumes a peculiar shape that resembles a highly skewed normal distribution with a long tail on one side. The distribution matches the _______
a) Gumble elective value distribution
b) Gumble extreme void distribution
c) Gumble end value distribution
d) Gumble extreme value distribution
Answer: d [Reason:] The mentioned Distribution pattern matches the Gumble extreme value distribution for which a mathematical expression is available. This means that, given a sequence similarity value, by using the mathematical formula for the extreme distribution, the statistical significance can be accurately estimated.
4. Which of the following is a part of the statistical test of sequences?
a) An optimal alignment between two chosen sequences is obtained at the end
b) Unrelated sequences of the same length are then generated through a randomization process
c) Unrelated sequences of the different length are then generated through a randomization process
d) Related sequences of the same length are then generated through a randomization process
Answer: b [Reason:] Unrelated sequences of the same length are then generated through a randomization process in which one of the two sequences is randomly shuffled. And the next step is that a new alignment score is computed for the shuffled sequence pair.
5. In the statistical test, randomization process in which one of the two given sequences is randomly shuffled.
Answer: a [Reason:] After the mentioned step, computation for the alignment score for the shuffled sequence pair is done. Further, More such scores are similarly obtained through repeated shuffling.
6. What is used to generate parameters for the extreme distribution?
a) The pool of alignment scores from the shuffled sequences
b) A single score of a shuffled sequence
c) The pool of alignment scores from the unshuffled sequences
d) The basic optimal score computed at the beginning of the test
Answer: a [Reason:] Maximum scores are obtained through repeated shuffling. Then the pool of alignment scores from the shuffled sequences is used to generate parameters for the extreme distribution. The original alignment score is then compared against the distribution of random alignments to determine whether the score is beyond random chance.
7. If the score is located in the extreme margin of the distribution, that means that the alignment between the two sequences is ______ due to random chance and is thus considered ______
a) unlikely, significant
b) unlikely, insignificant
c) unlikely, insignificant
d) very likely, significant
Answer: a [Reason:] The extreme margin of the distribution denotes the likeliness and thus significance. A P-value is given to indicate the probability that the original alignment is due to random chance.
8. It is not known whether the Gumble distribution applies equally well to gapped alignments.
Answer: a [Reason:] The statistics in the test were derived from ungapped local sequence alignments. Hence, it is not known whether the Gumble distribution applies equally well to gapped alignments. However, for all practical purposes, it is reasonable to assume that scores for gapped alignments essentially fit the same distribution. A frequently used software program for assessing statistical significance of a pairwise alignment is the PRSS program.
9. Which of the following is untrue about the PRSS program?
a) It stands for Probability of Random Shuffles
b) It is a web-based program that can be used to evaluate the statistical significance of DNA or protein sequence alignment
c) It first aligns two sequences using the Needleman-Wunsch algorithm and calculates the score
d) It holds one sequence in its original form and randomizes the order of residues in the other sequence.
Answer: c [Reason:] It first aligns two sequences using the Smith–Waterman algorithm and calculates the score. The shuffled sequence is realigned with the unshuffled sequence. The resulting alignment score is recorded. This process is iterated many (normally 1,000) times to help generate data for fitting the Gumble distribution.
10. The major disadvantage of the PRSS program is that it doesn’t allow partial shuffling.
Answer: b [Reason:] The major feature of the program is that it allows partial shuffling. For example, shuffling can be restricted to residues within a local window of 25–40, whereas the residues outside the window remain unchanged.
1. Which of the following is untrue?
a) Many entries in the Protein DataBank (PDB) are three-dimensional structures of multiple domains
b) The structures in PDB provide experimental information about interactions between domains at atomic detail
c) There are comparatively few three-dimensional structures compared to the amount of data available from the lower resolution large-scale experiments
d) Many entries in the Protein DataBank (PDB) are two-dimensional structures of multiple domains
Answer: d [Reason:] Analysis of structures consisting of multiple domains has uncovered some of the principles of domain interactions in three dimensions. This information can therefore be complementary to the experimental data on protein interactions and to the predicted interactions.
2. In protein domain family interaction map, the physical contacts of the domains in different families are represented by the lines between the nodes.
Answer: a [Reason:] Each node in this graph represents a protein domain family. There are a few families that are hubs in the network: these are large families that are functionally versatile, such as Rossmann domains indicated by an ‘R’ here. Most families engage in only one or two types of interactions.
3. In the Interaction map of domain families, the interactions of one family represent the sum of all the interactions of domains in that family.
Answer: a [Reason:] To study the large-scale patterns and evolution of interactions between protein domains, the interactions in terms of the domain families can be summarized. Thus the interactions of one family represent the sum of all the interactions of domains in that family. Precise information about contacts between individual domains can be extracted by analysis of PDB entries.
4. Most domain families only interact with one or two other families, while a few families are extremely versatile in their interactions and are connected to many families.
Answer: a [Reason:] The result of the known interactions between members of structural protein families is a graph of connections between families, where the nodes are protein families and the edges represent an interaction between at least one of the domains from each of the two families. This pattern is observed at the level of individual proteins as well, as similar networks can be constructed for the individual proteins in the yeast proteome, for instance.
5. Almost ______ engage in interactions with domains from their own family when one includes oligomeric proteins.
a) one fifith of all known families
b) one fourth of all known families
c) all of all known families
d) half of all known families
Answer: d [Reason:] In this case, half of all known families engage in interactions with domains from their own family. Such symmetrical interactions appear to be particularly favorable.
6. In order to understand the geometry of domain combinations, different structures of homologous pairs of domains must be studied.
Answer: a [Reason:] This is important, because though the methods for structure prediction of individual domains are well established, much less is known about assemblies of domains. The network of domain family interactions is a purely two-dimensional map: it lays out the connections between families but does not provide information on the three-dimensional geometry of interactions.
7. The investigation (Aloy and Russel) of domain combinations in multidomain proteins by Bashton and Chothia focuses on two-domain proteins belonging to the Rossmann domain family.
Answer: a [Reason:] These proteins generally consist of one Rossmann domain and one catalytic domain. As for the analysis of transient interactions, all the proteins belonging to one family of catalytic domains form the same type of interface to the Rossmann domains.
8. The linkers between the catalytic domain and the Rossmann domain were conserved in each family.
Answer: a [Reason:] This means that interface conservation within one catalytic family is a result of the direct evolutionary relationship between the proteins that have a particular pair of domains. In other words, each set of Rossmann domain proteins with a particular catalytic domain has descended from one common ancestral recombination event.
9. Across the different types of catalytic families, the position of the two domains with respect to one another varied, but only within a range of about ______
Answer: c [Reason:] This is the result of a functional constraint in these enzymes: the catalytic domain can only take up a variety of positions, as the substrate needs to be held sufficiently close to the NAD(P) cofactor of the Rossmann domain. In other multidomain proteins where there is no such strict functional constraint, the domain interfaces of one domain family to other families may well be more variable.
1. The classic protein separation methods involve two-dimensional gel electrophoresis followed by gel image analysis.
Answer: a [Reason:] Further characterization involves determination of amino acid composition, peptide mass fingerprints, and sequences using mass spectrometry (MS). Finally, database searching is needed for protein identification.
2. Which of the following is incorrect regarding 2D-Page?
a) It stands for Two-dimensional polyacrylamide gel electrophoresis
b) It separates proteins by charge only
c) The gel is run in one direction in a pH gradient under a non-denaturing condition
d) It works to separate proteins by isoelectric points (pI)
Answer: b [Reason:] it is a high-resolution technique that separates proteins by charge and mass. It works to separate proteins by isoelectric points (pI) and then in an orthogonal dimension under a denaturing condition to separate proteins by molecular weights (MW). This is followed by staining, usually silver staining, which is very sensitive, to reveal the position of all proteins. The result is a two-dimensional gel map; each spot on the map corresponds to a single protein being expressed.
3. Which of the following is incorrect regarding 2D-Page?
a) Not all proteins can be separated by this method or stained properly
b) The stained gel can be scanned and digitized for image analysis
c) Membrane proteins are largely hydrophilic and readily solubilized
d) One of the challenges of this technique is the separation of membrane proteins
Answer: c [Reason:] membrane proteins are largely hydrophobic and not readily solublized. They tend to aggregate in the aqueous medium of a two-dimensional gel. To overcome this problem, membrane proteins can be fractionated using specialized protocols and then electrophoresed using optimized buffers containing zwitterionic detergents. Subfractionation can be carried out to separate nuclear, cytosol, cytoskeletal, and other subcellular fractions to boost the concentrations of rare proteins and to reveal subcellular localizations of the proteins.
4. Comparing two-dimensional gel images from various experiments can sometimes pose a challenge because the gels, unlike DNA microarrays, may shrink or warp.
Answer: a [Reason:] This requires the software programs to be able to stretch or maneuver one of the gels relative to the other to find a common geometry. When the reference spots are aligned properly, the rest of the spots can be subsequently compared automatically.
5. Which of the following is incorrect regarding Mass Spectrometry Protein Identification?
a) The proteolysis doesn’t generate a pattern according to molecular weight
b) Proteins can be identified and characterized using MS
c) The proteins from a two dimensional gel system are first digested in situ with a protease
d) Protein spots of interest are excised from the two-dimensional gel
Answer: a [Reason:] The proteolysis generates a unique pattern of peptide fragments of various MWs, which is termed a peptide fingerprint. The fragments can be analyzed with MS, a high-resolution technique for determining molecular masses. Currently, electro-spray ionization MS and matrix-assisted laser desorption ionization (MALDI) MS are commonly used.
6. Electrospray ionization MS and matrix-assisted laser desorption ionization (MALDI) MS only differ in the ionization procedure used.
Answer: a [Reason:] In MALDI-MS, for example, the peptides are charged with positive ions and forced through an analyzing tube with a magnetic field. Peptides are analyzed in the gas phase. Because smaller peptides are deflected more than larger ones in a magnetic field, the peptide fragments can be separated according to molecular mass and charges. A detector generates a spectrum that displays ion intensity as a function of the mass-to-charge ratio.
7. Which of the following is incorrect regarding the Protein Identification through Database Searching?
a) MS characterization of proteins is highly dependent on bioinformatic analysis
b) Bioinformatics programs can be used to search for the identity of a protein in a database of theoretically digested proteins
c) Even in reality, the protease digestion is always perfect in MS
d) The purpose of the database search is to find exact or nearly exact matches
Answer: c [Reason:] in reality, protease digestion is rarely perfect, often generating partially digested products as a result of missed cuts at expected cutting sites. Peptides resulting from MALDI-MS are also charged, which increases their mass slightly.
8. ExPASY is a comprehensive proteomics web server with a suite of programs for searching peptide information from the SWISS-PROT and TrEMBL databases.
Answer: a [Reason:] There are twelve database search tools in this server dedicated to protein identification based on MS data. For example, the AACompIdent program identifies proteins based on pI, MW, and amino acid composition and compares these values with theoretical compositions of all proteins in SWISS-PROT/TrEMBL.
9. Which of the following is incorrect regarding Mascot and ProFound?
a) ProFound is a web server with a set of interconnected programs
b) ProFound searches a protein sequence database using MS fingerprinting information
c) Bayesian algorithm is not involved in ProFound
d) Mascot is a web server that identifies proteins based on peptide mass fingerprints, sequence entries, or raw MS/MS data from one or more peptides
Answer: c [Reason:] In ProFound, A Bayesian algorithm is used. It ranks the database matches according to the probability of database sequences producing the peptide mass fingerprints.
10. Which of the following is incorrect regarding Differential In-Gel Electrophoresis?
a) Proteins are mixed together before electrophoresis on a two-dimensional gel
b) Differentially expressed proteins in both conditions can’t be visualized in the same gel
c) In this, Differences in protein expression patterns can be detected in a similar way as in fluorescent-labeled DNA microarrays
d) Proteins from experimental and control samples are labeled with differently colored fluorescent dyes
Answer: b [Reason:] Differentially expressed proteins in both conditions can be co-separated and visualized in the same gel. Compared to regular 2D-PAGE, the process reduces the noise and improves the reproducibility and sensitivity of detection. In principle, it resembles the two-color DNA microarray analysis. The drawbacks of this approach are that different proteins take up fluorescent tags to different extents and that some proteins labeled with the fluorophores may become less soluble and precipitate before electrophoresis.