Database MCQ Number 00905

Database MCQ Set 1

1. Two common goals in sequence analysis are to identify sequences that encode proteins, which determine all cellular metabolisms, and to discover sequences that regulate the expression of genes or other cellular processes.
a) True
b) False

Answer

Answer: a [Reason:] Genomic sequencing meets both goals. However, only a small percentage of the genomic sequence of many organisms actually encodes proteins because of the presence of introns within coding regions and other noncoding regions in the genome.

2. cDNA libraries have been prepared that have the same sequences as the mRNA molecules produced by organisms, or else cDNA copies are sequenced directly by RT-PCR (copying of mRNA by reverse transcriptase followed by sequencing of the cDNA copy by the polymerase chain reaction).
a) True
b) False

Answer

Answer: a [Reason:] There has been a great deal of progress in developing computational methods for analyzing genomic sequences and finding these protein-encoding regions. But these methods are not completely reliable and, furthermore, such genomic sequences are often not available.

3. Using cDNA sequence with the ____ it is much simpler to locate protein-encoding sequences in these molecules.
a) exons taken out
b) exons removed
c) introns added
d) introns removed

Answer

Answer: d [Reason:] The only possible difficulty is that a gene of interest may be developmentally expressed or regulated in such a way that the mRNA is not present. This problem has been circumvented by pooling mRNA preparations from tissues that express a large proportion of the genome, from a variety of tissues and developing organs or from organisms subjected to several environmental influences.

4. An important development for computational purposes was the decision by Craig Venter to prepare databases of partial sequences of the expressed genes, called expressed sequence tags or ESTs.
a) True
b) False

Answer

Answer: a [Reason:] This was an important development from resolution point of view. This has just enough DNA sequence to give a good idea of the protein sequence.

5. The translated sequence can then be compared to a database of protein sequences with the hope of finding a strong similarity to a protein of known function, and hence to identify the function of the cloned EST.
a) True
b) False

Answer

Answer: a [Reason:] The translated sequence can then be compared in the mentioned way hence to identify the function of the cloned EST. The corresponding cDNA clone of the gene of interest can then be obtained and the gene completely sequenced.

6. Investigators are encouraged to submit their newly obtained sequences directly to a member of the International Nucleotide Sequence Database Collaboration, such as the NCBI, DDBJ, and EMBL.
a) True
b) False

Answer

Answer: a [Reason:] NCBI stands for National Center for Biotechnology Information. It manages GenBank. DDBJ and EMBL stand for DNA Database Bank of Japan and European Molecular Biology Laboratory respectively.

7. NCBI reviews new entries and updates existing ones, as requested.
a) True
b) False

Answer

Answer: a [Reason:] A database accession number, which is required to publish the sequence, is provided. New sequences are exchanged daily by the GenBank, EMBL, and DDBJ databases.

8. Which of the given statements is incorrect?
a) The simplest and newest way of submitting sequences is through the Web site on a Web form page called BankIt
b) The sequence can also be annotated with information about the sequence, such as mRNA start and coding regions
c) The submitted form is transformed into GenBank format and returned to the submitter for review before being added to GenBank
d) Sequin does not run on UNIX

Answer

Answer: d [Reason:] The other method of submission is to use Sequin (formerly called Authorin), which runs on personal computers and UNIX machines. The program provides an easy-to-use graphic interface and can manage large submissions such as genomic sequence information.

9. Which of the given statements is untrue?
a) There is no detailed check of sequence accuracy prior to submission to GenBank and other databases
b) Often, a sequence is submitted at the time of publication of the sequence in a journal article, providing a certain level of checking by the editorial peer review process
c) No sequence is submitted without being published or prior to publication
d) In laboratories performing large sequencing projects, such as those engaged in the Human Genome Project or the genome projects of model organisms, the granting agency requires a certain level of accuracy of the order of 1 possible error per 10 kb

Answer

Answer: c [Reason:] Many sequences are submitted without being published or prior to publication. As mentioned in option d, the level of accuracy should be sufficient for most sequence analysis applications such as sequence comparisons, pattern searching, and translation.

10. Granting agency requires a certain level of accuracy in case of errors. Which of the given statements is untrue regarding it?
a) In other laboratories, such as those performing a single-attempt sequencing of ESTs, the error rate may be much higher, approximately 1 in 100, including incorrectly identified bases and inserted or deleted bases
b) Incorrect bases always translate to the right amino acid
c) Base insertions/deletions will cause frame-shifts in the sequence
d) Making alignment with a protein sequence becomes difficult because of frameshifts

Answer

Answer: b [Reason:] In translating EST sequences in GenBank and other databases, incorrect bases may translate to the wrong amino acid. Another type of database sequence that is error-prone is a fragment of sequence from the immunological variant of a pathogenic organism, such as the regions in the protein coat of the human immunodeficiency virus (HIV). Although this low level of accuracy may be suitable for some purposes such as identification, for more detailed analyses, e.g., evolutionary analyses, the accuracy of such sequence fragments should be verified.

Database MCQ Set 2

1. A main application of pairwise alignment is retrieving biological sequences in databases based on similarity.
a) True
b) False

Answer

Answer: a [Reason:] This process involves submission of a query sequenceand performing a pairwise comparison of the query sequence with all individualsequences in a database. Thus, database similarity searching is pairwise alignmenton a large scale. This type of searching is one of the most effective ways to assign putativefunctions to newly determined sequences.

2. Dynamic programming method is the fastest and most practical method.
a) True
b) False

Answer

Answer: b [Reason:] Dynamic programming method is slow and impractical to use in most cases. Specialsearch methods are needed to speed up the computational process of sequence comparison.

3. Which of the following is not one of the requirements for implementing algorithms for sequence databasesearching?
a) Size of the dataset
b) Sensitivity
c) Specificity
d) Speed

Answer

Answer: a [Reason:] There are unique requirements for implementing algorithms for sequence databaseSearching out of which, the later three play an important role. However, speed can vary with the size of database. achieving all three at a time is nearly impossible.

4.Sensitivity refers to the ability to find as manycorrect hits as possible.
a) True
b) False

Answer

Answer: a [Reason:] Among the unique requirements for implementing algorithms for sequence database Searching, the first criterion is sensitivity, which refers to the ability to find as manycorrect hits as possible. It is measured by the extent of inclusion of correctly identified sequence members of the same family. These correct hits are considered ‘true positives’ in the database searching exercise.

5. The specificity refers to the ability to include incorrect hits.
a) True
b) False

Answer

Answer: b [Reason:] In heuristic database searching methods, The second requirement criterion is 1 also calledspecificity, which refers to the ability to exclude incorrect hits. These incorrect hits areunrelated sequences mistakenly identified in database searching and are considered ‘false positives.’

6. In heuristic methods, speed doesn’t vary with the size of database.
a) True
b) False

Answer

Answer: b [Reason:] The speed is the time it takes to get results from database searches. Depending on the size of the database, speed sometimes canbe a primary concern in the search methods.

7. An increase in sensitivity is associated with _______ in selectivity.
a) no specific change
b) increase
c) decrease
d) exponential increase

Answer

Answer: c [Reason:] Ideally, one wants to have the greatest sensitivity, selectivity, and speed in database searches. However, satisfying all three requirements is difficult in reality. What generally happens is that an increase in sensitivity is associated with decrease in selectivity. A very inclusive search tends to include many false positives. Similarly, an improvementin speed often comes at the cost of lowered sensitivity and selectivity. A compromise between the three criteria often has to be made.

8. Which of the following is incorrect?
a) Smith–Waterman algorithm is the fastest
b) Smith–Waterman algorithm is comparatively slower method
c) To speedup up comparison, heuristic methods are used
d) Heuristic algorithms perform faster searches

Answer

Answer: a [Reason:] Searching a large database using the dynamic programming methods, such as theSmith–Watermanalgorithm, although accurate and reliable, is too slow and impracticalwhencomputationalresources are limited. To speed up the comparison,heuristic methods have to be used. The heuristic algorithms perform faster searchesbecause they examine only a fraction of the possible alignments examined in regulardynamic programming.

9. Currently, there are two major heuristic algorithms for performing databasesearches: BLAST and FASTA.
a) True
b) False

Answer

Answer: a [Reason:] These methods are not guaranteed to find the optimal alignment or true homologs, but are 50–100 times faster than dynamic programming.The increased computational speed comes at a moderate expense of sensitivity andspecificity of the search, which is easily tolerated by working molecular biologists. Both programs can provide a reasonably good indication of sequence similarity by identifying similar sequence segments.

10. Which of the following is incorrect the ‘word’ method?
a) Both BLAST and FASTA use a heuristic word method
b) Word method is usedfor fast pairwise sequencealignment in BLAST and FASTA
c) The basic assumption is that two relatedsequences must have at least one word in common
d) Two related sequences must have at zero word in common while assuming

Answer

Answer: d [Reason:] This is the third method of pairwise sequence alignment. It works by findingshort stretches of identical or nearly identical letters in two sequences. These short strings of characters are called words, which are similar to the windows used in the dot matrix method. The basic assumption is that two related sequences must have at least one word in common. By first identifying word matches, a longer alignment can be obtained by extending similarity regions from the words. Once regions of high sequence similarity are found, adjacent high-scoring regions canbe joined into a full alignment.

Database MCQ Set 3

1. which of the given statements is incorrect about Block multiple sequence alignment format?
a) Identification starts contain a short identifier for the group of sequences from which the block was made and often is the original Prosite group ID
b) The identifier is terminated by a comma, and “BLOCK” indicates the entry type
c) AC contains the block number, a seven-character group number for sequences from which the block was made, followed by a letter (A–Z) indicating the order of the block in the sequences
d) The block number is a 5-digit number preceded by BL (BLOCKS database) or PR (PRINTS database)

Answer

Answer: b [Reason:] The identifier is terminated by a semicolon, and “BLOCK” indicates the entry type. Min, max is the minimum, maximum number of amino acids from the previous blocks or from the sequence starting. DE describes sequences from which the block was made.

2. BL contains information about the block: xxx is the amino acids in the spaced triplet found by MOTIF upon which the block is based.
a) True
b) False

Answer

Answer: a [Reason:] In addition to this, w is the width of the sequence segments (columns) in the block. s is the number of sequence segments (rows) in the block. Other values (n1, n2) describe statistical features of the block. Sequence id is a list of sequences. Each sequence line contains a sequence identifier, the offset from the beginning of the sequence to the block in parentheses, the sequence segment, and a weight for the segment.

3. Which of the given statements is incorrect about READSEQ?
a) It is an extremely useful sequence formatting program developed by D. G. Gilbert at Indiana University, Bloomington
b) It was developed at Indiana University, Bloomington
c) It can recognize a DNA or protein sequence file in any of the formats
d) It can recognize a DNA or protein sequence file in some particular formats

Answer

Answer: d [Reason:] It can identify the format, and write a new file with an alternative format. Some of these formats are used for special types of analyses such as multiple sequence alignment and phylogenetic analysis.

4. Data files that have multiple sequences, such as those required for multiple sequence alignment and phylogenetic analysis using parsimony (PAUP), are not converted in READSEQ.
a) True
b) False

Answer

Answer: a [Reason:] Data files with such multiple sequences as mentioned are converted in READSEQ. Options to reverse-complement and to remove gaps from sequences are included. SEQIO and another sequence conversion program for a UNIX machine.

5. The “from” programs convert sequence files from GCG format into the named format, and the “to” programs convert the alternative format into GCG format.
a) True
b) False

Answer

Answer: a [Reason:] In addition, the GCG programs include the following sequence formatting programs: (1) GETSEQ, which converts a simple ASCII file being received from a remote PC to GCG format; (2) REFORMAT, which will format a GCG file that has been edited, and will also perform other functions; and (3) SPEW, which sends a GCG sequence file as an ASCII file to a remote PC.

6. The Common Object Request Broker Architecture (CORBA) is the Object Management Group’s interface for objects.
a) True
b) False

Answer

Answer: a [Reason:] It allows different computer applications to communicate with each other through a common language, Interface Definition Language (IDL). To plan an object-oriented database by defining the classes of objects and the relationships among these objects, a specific set of procedures called the Unified Modeling Language (UML) has been devised by the OMG group.

7. The FASTA format is readily converted into other formats and also is smaller and simpler
a) True
b) False

Answer

Answer: a [Reason:] It contains just a line of sequence identifiers followed by the sequence without numbers, is very useful for browsing and analyzing purposes. One browser window may retrieve sequences from a database and a second may analyze these sequences.

8. Each DNA or protein sequence database entry has much information, including ______
a) an assigned accession number(s)
b) source organism
c) name of locus
d) reference number type(s)

Answer

Answer: d [Reason:] In addition to these keywords that apply to sequence, features in the sequence such as coding regions, intron splice sites, and mutations; and finally the sequence itself is given the sequence database entry. The above information is organized into a tabular form very much like that found in a relational database.

9. Which of the following is an incorrect statement?
a) The last column contains the sequences themselves
b) It is quite tough making an index of the information in each of these fields so that a search query can locate all the occurrences through the index
c) If one imagines a large table with each sequence entry occupying one row, then each column will include one of the above types of information for each sequence, and each column is called a FIELD
d) The DNA, protein, and reference databases have all been cross-referenced so that moving between them is readily accomplished

Answer

Answer: b [Reason:] It is very easy to make an index of the information in each of these fields so that a search query can locate all the occurrences through the index. Even related sequences are cross-referenced. In addition, the information in one database can be cross-referenced to that in another database.

10. Which of the given statements is incorrect about Database Types?
a) Relational databases are more useful in the development of biological databases
b) The tables in relational database are carefully indexed and cross-referenced with each other, sometimes using additional tables, so that each item in the database has a unique set of identifying features
c) The relational database orders data in tables made up of rows giving specific items in the database, and columns giving the features as attributes of those items
d) The two principal types of DBs are the relational and object-oriented databases

Answer

Answer: a [Reason:] The object-oriented database structure has been useful in the development of biological databases. The objects, such as genetic maps, genes, or proteins, each have an associated set of utilities for analysis and display of the object and a set of attributes such as identifying name or references.

Database MCQ Set 4

1. The rigorous dynamic programming method is normally not used for database searching, because it is slow and computationally expensive.
a) True
b) False

Answer

Answer: a [Reason:] Heuristics suchas BLAST and FASTA are developed for faster speed. However, the heuristic methods are limited in sensitivity and are not guaranteed to find the optimal alignment. Theyoften fail to find alignment for distantly related sequences.

2. FASTA and BLAST are ____ but ____ for larger datasets.
a) faster, more sensitive
b) faster, less sensitive
c) slower, less sensitive
d) slower, more sensitive

Answer

Answer: b [Reason:] Empirical tests have indeed shown that the exhaustive method produces superior results overthe heuristic methods like BLAST and FASTA. But heuristic methods are better and practical when it comes to assess larger datasets with comparatively low sensitivity.

3. Scan PS is a web-based program that implements a modified version of the Needleman-Wunsch algorithm.
a) True
b) False

Answer

Answer: b [Reason:] ScanPS (Scan Protein Sequence) is a web-based program that implements a modified version of the Smith–Waterman algorithm optimized for parallel processing. The major feature is that the program allows iterative searching similar to PSI-BLAST, which builds profiles from one round of search results and uses them for the second round of database searching. Full dynamic programmingis used in each cycle for added sensitivity.

4. Par Align is a web-based server that uses parallel processors to perform exhaustive sequence comparisons using either a parallelized version ofthe Smith–Waterman algorithm or a heuristic program for further speed gains.
a) True
b) False

Answer

Answer: a [Reason:] The heuristic subprogram first finds exact ungapped alignments and uses them as anchors for extension into gapped alignments by combining the scores of several diagonals in the alignment matrix. The search speed of ParAlign approaches to that of BLAST, but with higher sensitivity.

5. In Smith–Waterman algorithm, in initialization Step, the ___ row and ________ column are subject to gap penalty.
a) first, first
b) first, second
c) second, First
d) first, last

Answer

Answer: a [Reason:] In Smith–Waterman algorithm, first row and first column are set to 0. In the Needleman Wunsch algorithm, First row and first column are subject to gap penalty.

6. Local sequence alignments are necessary for many cases out of which one is repeats.
a) True
b) False

Answer

Answer: a [Reason:] It can also be used for modular organization of genes and proteins (exons, domains, etc.) Also it can be used in cases of sequences diverged so that similarity was retained, or can be detected, just in some sub-regions.

7. In SW algorithm, to align two sequences of lengths of m and n, _____time is required.
a) O(mn)
b) O(m²n)
c) O(m²n³)
d) O(mn²)

Answer

Answer: b [Reason:] The Smith–Waterman algorithm is quite demanding of time. Hence if two sequences of lengths of m and n have to be aligned, the required time is O(m²n). It requires O(mn) calculation steps.

8. One of the challenges in SWA is obtaining correct alignments in regions of low similarity between distantly related biological sequences.
a) True
b) False

Answer

Answer: a [Reason:] It is because mutations have added too much ‘noise’ over evolutionary time to allow for a meaningful comparison of those regions. Local alignment avoids such regions altogether and focuses on those with a positive score, i.e. those with an evolutionarily conserved signal of similarity.

9. Score can be negative in Smith–Waterman algorithm.
a) True
b) False

Answer

Answer: b [Reason:] Negative score is set to 0. In Needleman–Wunsch algorithm, the Score can be negative. Also, in Smith–Waterman algorithm, in tracing back step, it begins with the highest score, ends when 0 is encountered.

10. The function of the scoring matrix is to conduct one-to-one comparisons between all components in two sequences and record the optimal alignment results.
a) True
b) False

Answer

Answer: a [Reason:] The scoring process reflects the concept of dynamic programming. The final optimal alignment is found by iteratively expanding the growing optimal alignment.

Database MCQ Set 5

1. Which of the following statements about COG is incorrect regarding its features?
a) Currently, there are 4,873 clusters in the COG databases derived from unicellular organisms
b) It is constructed by comparing protein sequences encoded in forty-three completely sequenced genomes, which are mainly from prokaryotes, representing thirty major phylogenetic lineages
c) The interface for sequence searching in the COG database is the COGnitor program, which is based on gapped BLAST
d) It is a protein family database based on structural classification

Answer

Answer: d [Reason:] COG which stands for Cluster of Orthologous Groups, is a protein family database based on phylogenetic classification. Because orthologous proteins shared by three or more lineages are considered to have descended through a vertical evolutionary scenario, if the function of one of the members is known, functionality of other members can be assigned.

2. Which of the following statements about InterPro is incorrect regarding its features?
a) Protein relatedness is defined by the P-values from the BLAST alignments
b) The most closely related sequences are grouped into the lowest level clusters
c) More distant protein groups are merged into higher levels of clusters
d) The outcome of this cluster merging is a tree-like structure of functional categories

Answer

Answer: a [Reason:] InterPro is a database of clusters of homologous proteins similar to COG. Protein relatedness is defined by the E-values from the BLAST alignments. The database further provides gene ontology information for protein cluster at each level as well as keywords from InterPro domains for functional prediction.

3. Pfam is available at four locations around the world. Which of the following is not one of them?
a) UK
b) Sweden
c) US
d) Japan

Answer

Answer: d [Reason:] Pfam is available at four locations around the world each providing a core set of functionality for accessing each family. They are US, UK, Sweden and France. Documentation on the content and use of Pfam is available via the web.

4. Which of the following is not a member database of InterPro?
a) SCOP
b) HAMAP
c) PANTHER
d) Pfam

Answer

Answer: a [Reason:] The signatures from InterPro come from 11 member databases viz. CATH-Gene3D, HAMAP, PANTHER, Pfam, PIRSF, PRINTS, ProDom, PROSITE, SMART, SUPERFAMILY, TIGRFAMs.

5. Which of the following statements about SCOP is incorrect regarding its features?
a) Proteins with the same shapes but having little sequence or functional similarity are placed in different super families, and are assumed to have only a very distant common ancestor
b) Proteins having the same shape and some similarity of sequence and/or function are placed in ‘families’, and are assumed to have a closer common ancestor
c) SCOP was created in 1994 in the Centre of Protein Engineering and the University College London
d) It aims to determine the evolutionary relationship between proteins

Answer

Answer: c [Reason:] SCOP, Structural Classification of Proteins, was created in 1994 in the Centre of Protein Engineering and the Laboratory of Molecular Biology. It was maintained by Alexey G. Murzin and his colleagues in the Centre for Protein Engineering until its closure in 2010 and subsequently at the Laboratory of Molecular Biology in Cambridge, England.

6. What is the source of protein structures in SCOP and CATH?
a) Uniprot
b) Protein Data Bank
c) Ensemble
d) InterPro

Answer

Answer: b [Reason:] The source of protein structures in SCOP is PDB (Protein Data Bank). PDB is a secondary database which means it has protein structures derived from primary databases which have the protein sequences. UNIPROT is a primary database.

7. Which of the following statements about SUPERFAMILY database is incorrect regarding its features?
a) Sequences can be submitted raw or FASTA format
b) Sequences must be submitted in FASTA format only
c) It searches the database using a superfamily, family, or species name plus a sequence, SCOP, PDB or HMM ID’s
d) It has generated GO annotations for evolutionarily closed domains and distant domains

Answer

Answer: b [Reason:] SUPERFAMILY is a database of structural and functional annotation for all proteins and genomes. It classifies amino acid sequences into known structural domains, especially into SCOP super families. Sequences can be amino acids, a fixed frame nucleotide sequence, or all frames of a submitted nucleotide sequence. Up to 1000 sequences can be run at a time.

8. Which of the following statements about PRINTS and ProDom databases is incorrect regarding its features?
a) PRINTS is a compendium of protein fingerprints
b) Usually the motifs do not overlap, but are separated along a sequence, though they may be contiguous in 3D-space
c) Current versions of ProDom are built using a novel procedure based on recursive BLAST searches
d) ProDom domain database consists of an automatic compilation of homologous domains

Answer

Answer: c [Reason:] Current versions of ProDom are built using a novel procedure based on recursive PSI-BLAST searches and not just BLAST searches. And PRINTS is indeed a compendium of protein fingerprints. A fingerprint is a group of conserved motifs used to characterise a protein family; its diagnostic power is refined by iterative scanning of UniProt.

9. Which of the following statements about CATH-Gene3D and HAMAP databases is incorrect regarding its features?
a) CATH-Gene3D describes protein families and domain architectures in complete genomes
b) In CATH-Gene3D the functional annotation is provided to proteins from single resource
c) HAMAP profiles are manually created by expert curators they identify proteins that are part of well-conserved bacterial, archaeal and plastid-encoded proteins families or subfamilies.
d) HAMAP stands for High-quality Automated and Manual Annotation of microbial Proteomes

Answer

Answer: b [Reason:] In CATH-Gene3D Protein families are formed using a Markov clustering algorithm, followed by multi-linkage clustering according to sequence identity. Mapping of predicted structure and sequence domains is undertaken using hidden Markov models libraries representing CATH and Pfam domains. Functional annotation is provided to proteins from multiple resources. Functional prediction and analysis of domain architectures is available at the website.

10. Which of the following statements about PANTHER and TIGRFAMs databases is incorrect regarding its features?
a) TIGRFAMs provides a tool for identifying functionally related proteins based on sequence homology
b) TIGRFAMs is a collection of protein families, featuring curated multiple sequence alignments, hidden Markov models (HMMs) and annotation
c) Hidden Markov models (HMMs) are not used in PANTHER
d) PANTHER is a large collection of protein families that have been subdivided into functionally related subfamilies, using human expertise

Answer

Answer: c [Reason:] In PANTHER the subfamilies model the divergence of specific functions within protein families, allowing more accurate association with function (human-curated molecular function and biological process classifications and pathway diagrams), as well as inference of amino acids important for functional specificity. Hidden Markov models (HMMs) are built for each family and subfamily for classifying additional protein sequences.

Database MCQ Set 6

1. Which of the following is incorrect about ENTREZ?
a) It is a resource prepared only by the staff of the National Center for Biotechnology Information
b) It provides a series of forms that can be filled out to retrieve a Medline reference related to the molecular biology sequence databases
c) One straightforward way to access the sequence databases is through ENTREZ
d) It provides a series of forms that can be filled out to retrieve a DNA or protein sequence

Answer

Answer: a [Reason:] It is a resource prepared by the staff of the National Center for Biotechnology Information and National Library of Medicine, Bethesda, Maryland. After search for either a protein or a DNA sequence is chosen at the above address, another Web page is provided with a form to fill out for the search.

2. The databases Genbank, EMBL and DDBJ are updated daily.
a) True
b) False

Answer

Answer: a [Reason:] The mentioned database centers are updated daily and exchange new sequences daily, so that it is only necessary to access one of them. The EMBL stands for European Molecular Biology Laboratory and DDBJ for DNA DataBank of Japan.

3. Using boolean logic, the search looks for database entries that include the first term ____ the second, and subsequent terms repeated until the last term.
a) AND
b) OR
c) ExOR
d) NAND

Answer

Answer: a [Reason:] On the ENTREZ form, make a selection in the data entry window after the term “Search,” then enter search terms in the longer data entry window after “for.” The database will be searched for sequence database entries that contain all of these terms or related ones.

4. To assist in finding suitable terms, for each field, ENTREZ provides a list of index entries.
a) True
b) False

Answer

Answer: a [Reason:] When searching for terms in a particular field, some knowledge of the terms that are in the database can be helpful. The “Limits” link on the ENTREZ form page is used to limit the GenBank field to be searched, and various logical combinations of search terms may be designed by this method. These fields refer to the GenBank fields.

5. For a protein search, for example, current choices for fields include ______
Which of the following is a wrong blank?
a) Accession (number)
b) E. C. number
c) Issue
d) Journal number

Answer

Answer: d [Reason:] Other fields being- author name, journal name, keyword, modification date. Also, it includes organism, page number, primary accession (number), properties, protein name, publication date (of reference), seqID string, sequence length, substance name, text word, title word, volume, and sequence ID. Similar fields are shown for the DNA database search.

6. The results of searches in separate fields may be combined to narrow down the choices.
a) True
b) False

Answer

Answer: a [Reason:] The number of terms to be searched for and the field to be searched is the main decisions to be made. In doing so, it is important to be as specific as possible, or else there may be a great many possibilities.

7. Knowing ________ should be enough to find the required entry quickly.
a) publication date, protein name, journal name
b) accession number, protein name, or name of gene
c) publication date, protein name, or volume
d) properties, protein name, or title word

Answer

Answer: b [Reason:] If the same protein has been sequenced in several organisms, providing an organism name is also helpful. When the chosen search terms and fields have been decided and submitted, a database comprising all of the currently available sequences (called the non redundant or NR database) will be searched. Other database selections can also be made.

8. The program returns the number of matches found and provides an opportunity to narrow this list by including more terms.
a) True
b) False

Answer

Answer: a [Reason:] When the number of matching sequences has been narrowed to a reasonable number, the sequence may be retrieved in a chosen format in several straightforward steps. This helps in getting to the required data in less number of steps.

9. Which of the following is incorrect about ENTREZ?
a) There is no simple way to find the correct sequence without manually checking the information provided in each sequence, but this usually takes longer time
b) Before leaving ENTREZ, it is often useful to check for sequence database entries that are similar to the one of interest, called “neighbors” by ENTREZ
c) The expanded query searches other database entries of interest, such as the same protein in another organism, a large chromosomal sequence that includes the gene, or members of the same gene family
d) While visiting the site, note that ENTREZ has been adapted to search through a number of other biological databases, and also through Medline, and these searches are available from the initial ENTREZ Web page

Answer

Answer: a [Reason:] Opposite to what is mentioned in option a, this takes shorter time. It is important to look through the sequences to locate the one intended. There may be several different copies of the sequence because it may have been sequenced from more than one organism, or the sequence may be a mutant sequence, a particular clone, or a fragment.

10. Which of the following is incorrect about Retrieving a Specific Sequence?
a) It can be difficult to retrieve the sequence of a specific gene or protein simply because of the sheer number of sequences in the Gen-Bank database and the complex problem of indexing them
b) Other projects may benefit from the availability of better curated and annotated protein sequence databases, but not PIR and SwissProt
c) For projects that require the most currently available sequences, the NR databases should be searched
d) The genomic databases can also provide the sequence of a particular gene or protein. Protein sequences in the Genpro database are generated by automatic translation of DNA sequences

Answer

Answer: b [Reason:] Curated and annotated protein sequence databases include PIR and SwissProt. When read from cDNA copies of mRNA sequences, they provide a reliable sequence, given a certain amount of uncertainty as to the translational start site. Many protein sequences are now predicted by translation of genomic sequences, requiring a prediction of exons, a somewhat error-prone step.

Total Views: 27

DistPub Team

Distance Publisher (DistPub.com) provide project writing help from year 2007 and provide writing and editing help to hundreds student every year.