Data warehousing and Mining
Assignment A
Q1: Find out the frequent item set of maximum length for the given database D using Apriori algorithm with minimum support count is 2 and minimum confidence is 50 %.
TID | List of item IDs |
T100 | I1,I2, I5 |
T101 | I2, I4 |
T102 | I2, I3 |
T103 | I1, I2, I4 |
T104 | I1, I3 |
T105 | I2, I3 |
T106 | I1, I3 |
T107 | I1 ,I2, I3, I5 |
T108 | I1, I2, I3 |
Q2: What is Clustering? What are the different challenges to clustering?
Q3: What is classification? Explain the decision tree technique with example.
Q4: What is bitmap indexing? Explain with example .Write the techniques to optimize the extraction process.
Q5: The Retail industry is a major application area for data mining since it collects huge amounts of data on sales, customer shopping history, goods transportation, consumption and service record. Prove this.
Q6: What are the different schemas for data warehouse? Explain by example.
Write a query for defining a cube in star schema.
Q7: What is data mining in KDD? Explain the architecture of typical data mining.
Q8: What are the different OLAP operations in multidimensional data model?
Case Study
Most banks and financial institutions offers a wide variety of banking services (such as checking, savings, and business and individual customer transactions), credit (such as business , mortgage and automobile loans) , an investment services (such as mutual funds).Some also offer insurance services and stock investment services.
Financial data collected in the banking and financial industries are often relatively complete, reliable and of high quality, which facilitate systematic data analysis and data mining.
Q1. Give some typical cases to prove these facts i.e. how you can design a data warehouse , what are the techniques those can be used in this and predict right loan payment, customer credit policy system etc.
Assignment C
Q1: Which type of decision is one that happens repeatedly, and often periodically, whether weekly, monthly, quarterly, or yearly?
- a) Structured decision
- b) Non structured decision
- c) Nonrecurring decision
- d) None of the above
Q2: In multi-dimensional analysis, roll-up can be achieved by:
- a) Moving up a dimension hierarchy like city -> state, etc.
- b) Moving down a dimension hierarchy like state -> city, etc.
- c) Adding a new dimension
- d) Removing one or more dimensions
Q3: Which technique can be used to reduce the size of the candidate k-itemsets in Apriori Algorithm?
- a) Pruning
- b) Cleaning
- c) Hash based
- d) Partitioning
Q4: Classification belongs to which type of learning?
- a) Supervised
- b) Unsupervised
- c) Rote
- d) Machine
Q5: Structured query language (SQL) is a standardized fourth generation query language found in most DBMSs
- a) True
- b) false
- c) nil
- d) nil
Q6: A data warehouse is a logical collection of information – gathered from many different operational databases – used to create business intelligence that supports business analysis activities and decision-making tasks.
- a) Ttrue
- b) false
- c) nil
- d) nil
Q7: Online transactions processing (OLTP) is the manipulation of information to support decision making.
- a) true
- b) false
- c) nil
- d) nil
Q8: Data warehouses support only OLAP.
- a) True
- b) false
- c) nil
- d) nil
Q9: A database is a collection of information that you organize and access according to the physical structure of that information.
- a) True
- b) False
- c) nil
- d) nil
Q10: The physical view of information focuses on how you need to arrange and access information to meet the needs of the business.
- a) true
- b) false
- c) nil
- d) nil
Q11: A view helps you add, change, and delete information in a database and mine it for valuable information
- a) True
- b) false
- c) nil
- d) nil
Q12: Apriori algorithm used in–
- a) Clustering
- b) Classification
- c) Association Rule mining
- d) None of these
Q13: DBSCAN technique comes under–
- a) Classification
- b) Clustering
- c) Prediction
- d) Other
Q14: Data warehouse schema can be–
- a) Star schema
- b) Asynchronous table
- c) Formula
- d) None of these
Q15: In Association Rule Mining support (A=>B) is–
- a) Probability (AUB)
- b) Subtraction i.e. (A-B)
- c) Addition i.e. (A+B)
- d) None of these
Q16: Select the Non Data Mining techniques–
- a) Association rule mining
- b) Clustering
- c) Classification
- d) Selection Sort
Q17: Which step in Apriori algorithm uses apriori property?
- a) Join step
- b) Prune step
- c) Candidate generation step
- d) Query processing
Q18: Which among the following is an accuracy measure of classifier?
- a) Support
- b) Confidence
- c) Specificity
- d) Recall
Q19: Which decision type below represents employing a new marketing campaign?
- a) Structured decision
- b) Non structured decision
- c) Nonrecurring decision
- d) Recurring decision
Q20: Which phrase describes a decision support system?
- a) Highly flexible IT system
- b) Interactive IT system
- c) Designed to support decision making
- d) All of the above
Q21: Which component of a DSS consists of both the DSS models and the DSS model management system?
- a) Data management
- b) Model management
- c) User interface management
- d) All of the above
Q22: Which type of AI system is an adaptive system that works independently, carrying out specific, repetitive, or predictable tasks?
- a) Expert systems
- b) Neural networks
- c) Genetic algorithms
- d) None of the above
Q23: All of the following are IT components of an expert system, except:
- a) Knowledge base
- b) User acquisition
- c) Inference engine
- d) User interface
Q24: An intelligent agent that checks your e-mail and sorts it according to priority is called
- a) User agent
- b) Monitoring-and-surveillance agent
- c) Data-mining agent
- d) Buyer agent
Q25: Which of the following uses a series of logically related two-dimensional tables or files to store information in the form of a database?
- a) Database
- b) Database management system
- c) Data warehouse
- d) None of the above
Q26: All of the following terms describe OLAP, except
- a) The gathering of input information
- b) Processing input information
- c) Updating existing information to reflect to the gathered and processed information
- d) None of the above
Q27: Which tool is used to help an organization build and use business intelligence?
- a) Data warehouse
- b) Data mining tools
- c) Database management systems
- d) All of the above
Q28: What does the data dictionary identify?
- a) Field names
- b) Field types
- c) Field formats
- d) All of the above
Q29: What DBMS component contains facilities to help you develop transaction-intensive applications?
- a) DBMS engine
- b) Data definition subsystem
- c) Application generation subsystem
- d) Data administration subsystem
Q30: Which of the following is a data manipulation tool?
- a) File generators
- b) Query by example tool
- c) Structure question language
- d) All of the above
Q31: What data manipulation tool is a standardized fourth-generation query language found in most DBMSs?
- a) Report generator
- b) Query-by-example tool
- c) Statistical tool
- d) None of the above
Q32: The data administration subsystem helps you perform all of the following, except
- a) Backups and recovery
- b) Query optimization
- c) Security management
- d) Create, change, and delete information
Q33: Which data administration subsystem periodically backs up information contained in a database?
- a) Concurrency control facilities
- b) Reorganization facilities
- c) Backup and recovery facilities
- d) Security management facilities
Q34: Who is the person responsible for the more technical and operational aspects of managing the information contained in organizational databases?
- a) Chief information officer
- b) Data administration
- c) Data information officer
- d) None of the above
Q35: Which phase of decision making carries out the chosen solution, monitors the results, and makes adjustments as necessary?
- a) Intelligence
- b) Design
- c) Choice
- d) None of the above
Q36: Which type of intelligent agent travels around a network finding information and bringing it back to you?
- a) Personal agent
- b) User agent
- c) Monitoring-and-surveillance agent
- d) Shopping bots
Q37: Intelligent agents are software that does all of the following, except–
- a) Assists you
- b) Acts on your behalf
- c) Performs repetitive computer-related tasks
- d) Performs repetitive information-processing tasks
Q38: Business intelligence is information about your–
- a) Customers
- b) Competitors
- c) Partners
- d) Competitive environment
Q39: Operational databases are databases that support OLTP.
- a) True
- b) False
- c) nil
- d) nil
Q40: A primary key field can be blank
- a) true
- b) false
- c) NIL
- d) nil
Reviews
There are no reviews yet.