Q1: All of the following accurately describe Hadoop, EXCEPT
Answer option
- A. Open source
- B. Real-time
- C. Java-based
- D. Distributed computing approach
Answer
Answer B. Real-time
Q2: ___has the world’s largest Hadoop cluster
Answer option
- A. Apple
- B. Datamatics
- C. Facebook
- D. None of the mentioned
Answer
Answer C. Facebook
Q3: What are the five V’s of Big Data?
Answer option
- A. Volume
- B. Velocity
- C. Variety
- D. All the above
Answer
Answer D. All the above
Q4: ___hides the limitations of Java behind a powerful and concise Clojure API for Cascading
Answer option
- A. Scalding
- B. Cascalog
- C. Hcatalog
- D. Hcalding
Answer
Answer B. Cascalog
Q5: What are the main components of Big Data?
Answer option
- A. MapReduce
- B. HDFS
- C. YARN
- D. All of these
Answer
Answer D. All of these
Q6: What are the different features of Big Data Analytics?
Answer option
- A. Open-Source
- B. Scalability
- C. Data Recovery
- D. All the above
Answer
Answer D. All the above
Q7: Define the Port Numbers for NameNode, Task Tracker and Job Tracker
Answer option
- A. NameNode
- B. Task Tracker
- C. Job Tracker
- D. All of the above
Answer
Answer D. All of the above
Q8: This is an approach to selling goods and services in which a prospect explicitly agrees in advance to receive marketing information
Answer option
- A. customer managed relationship
- B. data mining
- C. permission marketing
- D. one-to-one marketing
- E. batch processing
Answer
Answer C. permission marketing
Q9: This is an XML-based metalanguage developed by the Business Process Management Initiative (BPMI) as a means of modeling business processes, much as XML is, itself, a metalanguage with the ability to model enterprise data
Answer option
- A. BizTalk
- B. BPML
- C. e-biz
- D. ebXML
- E. ECB
Answer
Answer B. BPML
Q10: This is a central point in an enterprise from which all customer contacts are managed
Answer option
- A. contact center
- B. help system
- C. multichannel marketing
- D. call center
- E. help desk
Answer
Answer C. multichannel marketing
Q11: This is the practice of dividing a customer base into groups of individuals that are similar in specific ways relevant to marketing, such as age, gender, interests, spending habits, and so on
Answer option
- A. customer service chat
- B. customer managed relationship
- C. customer life cycle
- D. customer segmentation
- E. change management
Answer
Answer D. customer segmentation
Q12: Can decision trees be used for performing clustering?
Answer option
- A. True
- B. False
Answer
Answer A. True
Q13: Which of the following is the most appropriate strategy for data cleaning before performing clustering analysis, given less than desirable number of data points
- Capping and flouring of variables
- Removal of outliers Options
A. 1 only
B. 2 only
C. 1 and 2
D. None of the above
Answer
Answer A. 1 only
Q14: The problem of finding hidden structure in unlabeled data is called
Answer option
- A. Supervised learning
- B. Unsupervised learning
- C. Reinforcement learning
Answer
Answer B. Unsupervised learning
Q15: Task of inferring a model from labeled training data is called
Answer option
- A. Unsupervised learning
- B. Supervised learning
- C. Reinforcement learning
Answer
Answer B. Supervised learning
Q16: Some telecommunication company wants to segment their customers into distinct groups in order to send appropriate subscription offers, this is an example of
Answer option
- A. Supervised learning
- B. Data extraction
- C. Serration
- D. Unsupervised learning
Answer
Answer D. Unsupervised learning
Q17: Self-organizing maps are an example of
Answer option
- A. Unsupervised learning
- B. Supervised learning
- C. Reinforcement learning
- D. Missing data imputation
Answer
Answer A. Unsupervised learning
Q18: You are given data about seismic activity in Japan, and you want to predict a magnitude of the next earthquake, this is in an example of
Answer option
- A. Supervised learning
- B. Unsupervised learning
- C. Serration
- D. Dimensionality reduction
Answer
Answer A. Supervised learning
Q19: Assume you want to perform supervised learning and to predict number of newborns according to size of storks’ population it is an example of
Answer option
- A. Classification
- B. Regression
- C. Clustering
- D. Structural equation modelling
Answer
Answer B. Regression
Q20: Discriminating between spam and ham e-mails is a classification task, true or false?
Answer option
- A. True
- B. False
Answer
Answer A. True
Q21: In the example of predicting number of babies based on storks’ population size, number of babies is
Answer option
- A. outcome
- B. feature
- C. attribute
- D. observation
Answer
Answer A. outcome
Q22: Data set {brown, black, blue, green, red} is example of Select one
Answer option
- A. Continuous attribute
- B. Ordinal attribute
- C. Numeric attribute
- D. Nominal attribute
Answer
Answer C. Numeric attribute
Q23: Which of the following activities is NOT a data mining task?
Answer option
- A. Predicting the future stock price of a company using historical records
- B. Monitoring and predicting failures in a hydropower plant
- C. Extracting the frequencies of a sound wave
- D. Monitoring the heart rate of a patient for abnormalities Show Answer
Answer
Answer C. Extracting the frequencies of a sound wave
Q24: Data Visualization in mining cannot be done using Select one
Answer option
- A. Photos
- B. Graphs
- C. Charts
- D. Information Graphics
Answer
Answer A. Photos
Q25: Which of the following is not a data pre-processing methods Select one
Answer option
- A. Data Visualization
- B. Data Discretization
- C. Data Cleaning
- D. Data Reduction
Answer
Answer A. Data Visualization
Q26: Dimensionality reduction reduces the data set size by removing___
Answer option
- A. composite attributes
- B. derived attributes
- C. relevant attributes
- D. irrelevant attributes
Answer
Answer C. relevant attributes
Q27: The difference between supervised learning and unsupervised learning is given by Select one
Answer option
- A. unlike unsupervised learning, supervised learning needs labeled data
- B. unlike unsupervised learning, supervised learning can be used to detect outliers
- C. there is no difference
- D. unlike supervised leaning, unsupervised learning can form new classes
Answer
Answer D. unlike supervised leaning, unsupervised learning can form new classes
Q28: Which of the following activities is a data mining task? Select one
Answer option
- A. Monitoring the heart rate of a patient for abnormalities
- B. Extracting the frequencies of a sound wave
- C. Predicting the outcomes of tossing a (fair) pair of dice
- D. Dividing the customers of a company according to their profitability
Answer
Answer A. Monitoring the heart rate of a patient for abnormalities
Q29: Identify the example of sequence data Select one
Answer option
- A. weather forecast
- B. data matrix
- C. market basket data
- D. genomic data
Answer
Answer A. weather forecast
Q30: To detect fraudulent usage of credit cards, the following data mining task should be used Select one
Answer option
- A. Outlier analysis
- B. prediction
- C. association analysis
- D. feature selection
Answer
Answer D. feature selection
Q31: Which of the following is NOT example of ordinal attributes? Select one
Answer option
- A. Zip codes
- B. Ordered numbers
- C. Movie ratings
- D. Military ranks
Answer
Answer A. Zip codes
Q32: Data scrubbing can be defined as Select one
Answer option
- A. Check field overloading
- B. Delete redundant tuples
- C. Use simple domain knowledge (e.g., postal code, spell-check) to detect errors and make corrections
- D. Analyzing data to discover rules and relationship to detect violators
Answer
Answer A. Check field overloading
Q33: Which data mining task can be used for predicting wind velocities as a function of temperature, humidity, air pressure, etc.?
Answer option
- A. Cluster Analysis
- B. Regression
- C. Clasification
- D. Sequential pattern discovery
Answer
Answer C. Clasification
Q34: Which statement is not TRUbE regarding a data mining task?
Answer option
- A. Clustering is a descriptive data mining task
- B. Classification is a predictive data mining task
- C. Regression is a descriptive data mining task
- D. Deviation detection is a predictive data mining task
Answer
Answer C. Regression is a descriptive data mining task
Q35: Identify the example of Nominal attribute Select one
Answer option
- A. Temperature
- B. Salary
- C. Mass
- D. Gender
Answer
Answer C. Mass
Q36: Synonym for data mining is Select one
Answer option
- A. Data Warehouse
- B. Knowledge discovery in database
- C. Business intelligence
- D. OLAP
Answer
Answer D. OLAP
Q37: Nominal and ordinal attributes can be collectively referred to as___attributes Select one
Answer option
- A. perfect
- B. qualitative
- C. consistent
- D. optimized
Answer
Answer B. qualitative
Q38: Which of the following is not a data mining task?
Answer option
- A. Feature Subset Detection
- B. Association Rule Discovery
- C. Regression
- D. Sequential Pattern Discovery
Answer
Answer B. Association Rule Discovery
Q39: Which of the following is an Entity identification problem? Select one
Answer option
- A. One person with different email address
- B. One person’s name written in different way
- C. Title for person
- D. One person with multiple phone numbers Show Answer
Answer
Answer A. One person with different email address
Q40: In Binning, we first sort data and partition into (equal-frequency) bins and then which of the following is not a valid step Select one
Answer option
- A. smooth by bin boundaries
- B. smooth by bin median
- C. smooth by bin means
- D. smooth by bin values
Answer
Answer B. smooth by bin median