Generic selectors
Exact matches only
Search in title
Search in content
Search in posts
Search in pages
Filter by Categories
nmims post
Objective Type Set
Online MCQ Assignment
Question Solution
Solved Question
Uncategorized

Hadoop MCQ Set 1

1. ________ systems are scale-out file-based (HDD) systems moving to more uses of memory in the nodes.
a) NoSQL
b) NewSQL
c) SQL
d) All of the mentioned

View Answer

Answer: a [Reason:] NoSQL systems make the most sense whenever the application is based on data with varying data types and the data can be stored in key-value notation.

2. Point out the correct statement :
a) Hadoop is ideal for the analytical, post-operational, data-warehouse-ish type of workload
b) HDFS runs on a small cluster of commodity-class nodes
c) NEWSQL is frequently the collection point for big data
d) None of the mentioned

View Answer

Answer: a [Reason:] Hadoop together with a relational data warehouse, they can form very effective data warehouse infrastructure.

3. Hadoop data is not sequenced and is in 64MB to 256 MB block sizes of delimited record values with schema applied on read based on:
a) HCatalog
b) Hive
c) Hbase
d) All of the mentioned

View Answer

Answer: a [Reason:] Other means of tagging the values also can be used.

4. __________ are highly resilient and eliminate the single-point-of-failure risk with traditional Hadoop deployments
a) EMR
b) Isilon solutions
c) AWS
d) None of the mentioned

View Answer

Answer: b [Reason:] enterprise data protection and security options including file system auditing and data-at-rest encryption to address compliance requirements is also provided by Isilon solution.

5. Point out the wrong statement :
a) EMC Isilon Scale-out Storage Solutions for Hadoop combine a powerful yet simple and highly efficient storage platform
b) Isilon’s native HDFS integration means you can avoid the need to invest in a separate Hadoop infrastructure
c) NoSQL systems do provide high latency access and accommodate less concurrent users
d) None of the mentioned

View Answer

Answer: c [Reason:] NoSQL systems do provide low latency access and accommodate many concurrent users.

6. HDFS and NoSQL file systems focus almost exclusively on adding nodes to :
a) Scale out
b) Scale up
c) Both Scale out and up
d) None of the mentioned

View Answer

Answer: a [Reason:] HDFS and NoSQL file systems focus almost exclusively on adding nodes to increase performance (scale-out) but even they require node configuration with elements of scale up.

7. Which is the most popular NoSQL database for scalable big data store with Hadoop ?
a) Hbase
b) MongoDB
c) Cassandra
d) None of the mentioned

View Answer

Answer: a [Reason:] HBase is the Hadoop database: a distributed, scalable Big Data store that lets you host very large tables — billions of rows multiplied by millions of columns — on clusters built with commodity hardware.

8. The ___________ can also be used to distribute both jars and native libraries for use in the map and/or reduce tasks.
a) DataCache
b) DistributedData
c) DistributedCache
d) All of the mentioned

View Answer

Answer: c [Reason:] The child-jvm always has its current working directory added to the java.library.path and LD_LIBRARY_PATH.

9. HBase provides ___________ like capabilities on top of Hadoop and HDFS.
a) TopTable
b) BigTop
c) Bigtable
d) None of the mentioned

View Answer

Answer: c [Reason:] Google Bigtable leverages the distributed data storage provided by the Google File System.

10. _______ refers to incremental costs with no major impact on solution design, performance and complexity.
a) Scale-out
b) Scale-down
c) Scale-up
d) None of the mentioned

View Answer

Answer: c [Reason:] Adding more CPU/RAM/Disk capacity to Hadoop DataNode that is already part of a cluster does not require additional network switches.

Hadoop MCQ Set 2

1. For running hadoop service daemons in Hadoop in secure mode, ___________ principals are required.
a) SSL
b) Kerberos
c) SSH
d) None of the mentioned

View Answer

Answer: b [Reason:] Each service reads auhenticate information saved in keytab file with appropriate permission.

2. Point out the correct statement :
a) Hadoop does have the definition of group by itself
b) MapReduce JobHistory server run as same user such as mapred
c) SSO environment is managed using Kerberos with LDAP for Hadoop in secure mode
d) None of the mentioned

View Answer

Answer: c [Reason:] You can change a way of mapping by specifying the name of mapping provider as a value of hadoop.security.group.mapping.

3. The simplest way to do authentication is using _________ command of Kerberos.
a) auth
b) kinit
c) authorize
d) all of the mentioned

View Answer

Answer: b [Reason:] HTTP web-consoles should be served by principal different from RPC’s one.

4. Data transfer between Web-console and clients are protected by using :
a) SSL
b) Kerberos
c) SSH
d) None of the mentioned

View Answer

Answer: a [Reason:] AES offers the greatest cryptographic strength and the best performance.

5. Point out the wrong statement :
a) Data transfer protocol of DataNode does not use the RPC framework of Hadoop
b) Apache Oozie which access the services of Hadoop on behalf of end users need to be able to impersonate end users
c) DataNode must authenticate itself by using privileged ports which are specified by dfs.datanode.address and dfs.datanode.http.address
d) None of the mentioned

View Answer

Answer: d [Reason:] Authentication is based on the assumption that the attacker won’t be able to get root privileges.

6. In order to turn on RPC authentication in hadoop, set the value of hadoop.security.authentication property to :
a) zero
b) kerberos
c) false
d) none of the mentioned

View Answer

Answer: b [Reason:] Security settings need to be modified properly for robustness.

7. The __________ provides a proxy between the web applications exported by an application and an end user.
a) ProxyServer
b) WebAppProxy
c) WebProxy
d) None of the mentioned

View Answer

Answer: b [Reason:] If security is enabled it will warn users before accessing a potentially unsafe web application. Authentication and authorization using the proxy is handled just like any other privileged web application.

8. ___________ used by YARN framework which define how any container launched and controlled.
a) Container
b) ContainerExecutor
c) Executor
d) All of the mentioned

View Answer

Answer: b [Reason:] The container process has the same Unix user as the NodeManager.

9. The ____________ requires that paths including and leading up to the directories specified in yarn.nodemanager.local-dirs
a) TaskController
b) LinuxTaskController
c) LinuxController
d) None of the mentioned

View Answer

Answer: b [Reason:] LinuxTaskController keeps track of all paths and directories on datanode.

10. The configuration file must be owned by the user running :
a) DataManager
b) NodeManager
c) ValidationManager
d) None of the mentioned

View Answer

Answer: b [Reason:] To re-cap,local file-sysytem permissions need to be modified.

Hadoop MCQ Set 3

1. Apache _______ is a serialization framework that produces data in a compact binary format.
a) Oozie
b) Impala
c) kafka
d) Avro

View Answer

Answer: d [Reason:] Apache Avro doesn’t require proxy objects or code generation.

2. Point out the correct statement :
a) Apache Avro is a framework that allows you to serialize data in a format that has a schema built in
b) The serialized data is in a compact binary format that doesn’t require proxy objects or code generation
c) Including schemas with the Avro messages allows any application to deserialize the data
d) All of the mentioned

View Answer

Answer: d [Reason:] Instead of using generated proxy libraries and strong typing, Avro relies heavily on the schemas that are sent along with the serialized data.

3. Avro schemas describe the format of the message and are defined using :
a) JSON
b) XML
c) JS
d) All of the mentioned

View Answer

Answer: a [Reason:] The JSON schema content is put into a file.

4. The ____________ is an iterator which reads through the file and returns objects using the next() method.
a) DatReader
b) DatumReader
c) DatumRead
d) None of the mentioned

View Answer

Answer: b [Reason:] DatumReader reads the content through the DataFileReader implementation.

5. Point out the wrong statement :
a) Java code is used to deserialize the contents of the file into objects
b) Avro allows you to use complex data structures within Hadoop MapReduce jobs
c) The m2e plug-in automatically downloads the newly added JAR files and their dependencies
d) None of the mentioned

View Answer

Answer: d [Reason:] A unit test is useful because you can make assertions to verify that the values of the deserialized object are the same as the original values.

6. The ____________ class extends and implements several Hadoop-supplied interfaces.
a) AvroReducer
b) Mapper
c) AvroMapper
d) None of the mentioned

View Answer

Answer: c [Reason:] AvroMapper is used to provide the ability to collect or map data.

7. ____________ class accepts the values that the ModelCountMapper object has collected.
a) AvroReducer
b) Mapper
c) AvroMapper
d) None of the mentioned

View Answer

Answer: a [Reason:] AvroReducer summarizes them by looping through the values.

8. The ________ method in the ModelCountReducer class “reduces” the values the mapper collects into a derived value
a) count
b) add
c) reduce
d) all of the mentioned

View Answer

Answer: c [Reason:] In some case, it can be simple sum of the values.

9. Which of the following works well with Avro ?
a) Lucene
b) kafka
c) MapReduce
d) None of the mentioned

View Answer

Answer: c [Reason:] You can use Avro and MapReduce together to process many items serialized with Avro’s small binary format.

10. __________ tools is used to generate proxy objects in Java to easily work with the objects.
a) Lucene
b) kafka
c) MapReduce
d) Avro

View Answer

Answer: d [Reason:] Avro serialization includes the schema with it — in JSON format — which allows you to have different versions of objects.

Hadoop MCQ Set 4

1. Spark was initially started by ____________ at UC Berkeley AMPLab in 2009.
a) Mahek Zaharia
b) Matei Zaharia
c) Doug Cutting
d) Stonebraker

View Answer

Answer: b [Reason:] Apache Spark is an open-source cluster computing framework originally developed in the AMPLab at UC Berkeley.

2. Point out the correct statement :
a) RSS abstraction provides distributed task dispatching, scheduling, and basic I/O functionalities
b) For cluster manager, Spark supports standalone Hadoop YARN
c) Hive SQL is a component on top of Spark Core
d) None of the mentioned

View Answer

Answer: b [Reason:] Spark requires a cluster manager and a distributed storage system.

3. ____________ is a component on top of Spark Core.
a) Spark Streaming
b) Spark SQL
c) RDDs
d) All of the mentioned

View Answer

Answer: b [Reason:] Spark SQL introduces a new data abstraction called SchemaRDD, which provides support for structured and semi-structured data.

4. Spark SQL provides a domain-specific language to manipulate ___________ in Scala, Java, or Python.
a) Spark Streaming
b) Spark SQL
c) RDDs
d) All of the mentioned

View Answer

Answer: c [Reason:] Spark SQL provides SQL language support, with command-line interfaces and ODBC/JDBC server.

5. Point out the wrong statement :
a) For distributed storage, Spark can interface with a wide variety, including Hadoop Distributed File System (HDFS)
b) Spark also supports a pseudo-distributed mode, usually used only for development or testing purposes
c) Spark has over 465 contributors in 2014
d) All of the mentioned

View Answer

Answer: d [Reason:] Spark is the most active project in the Apache Software Foundation and among Big Data open source projects.

6. ______________ leverages Spark Core’s fast scheduling capability to perform streaming analytics.
a) MLlib
b) Spark Streaming
c) GraphX
d) RDDs

View Answer

Answer: b [Reason:] Spark Streaming ingests data in mini-batches and performs RDD transformations on those mini-batches of data.

7. ____________ is a distributed machine learning framework on top of Spark
a) MLlib
b) Spark Streaming
c) GraphX
d) RDDs

View Answer

Answer: a [Reason:] MLlib implements many common machine learning and statistical algorithms to simplify large scale machine learning pipelines.

8. ________ is a distributed graph processing framework on top of Spark.
a) MLlib
b) Spark Streaming
c) GraphX
d) All of the mentioned

View Answer

Answer: c [Reason:] GraphX started initially as a research project at UC Berkeley AMPLab and Databricks, and was later donated to the Spark project.

9. GraphX provides an API for expressing graph computation that can model the __________ abstraction.
a) GaAdt
b) Spark Core
c) Pregel
d) None of the mentioned

View Answer

Answer: c [Reason:] GraphX is used for machine learning.

10. Spark architecture is ___________ times as fast as Hadoop disk-based Apache Mahout and even scales better than Vowpal Wabbit.
a) 10
b) 20
c) 50
d) 100

View Answer

Answer: a [Reason:] Spark architecture has proven scalability to over 8000 nodes in production.

Hadoop MCQ Set 5

1. ____________ is a distributed real-time computation system for processing large volumes of high-velocity data.
a) Kafka
b) Storm
c) Lucene
d) BigTop

View Answer

Answer: b [Reason:] Storm on YARN is powerful for scenarios requiring real-time analytics, machine learning and continuous monitoring of operations.

2. Point out the correct statement :
a) A Storm topology consumes streams of data and processes those streams in arbitrarily complex ways
b) Apache Storm is a free and open source distributed realtime computation system
c) Storm integrates with the queueing and database technologies you already use
d) All of the mentioned

View Answer

Answer: d [Reason:] Storm has many use cases: realtime analytics, online machine learning, continuous computation, distributed RPC, ETL, and more.

3. Storm integrates with __________ via Apache Slider
a) Scheduler
b) YARN
c) Compaction
d) All of the mentioned

View Answer

Answer: c [Reason:] Impala is open source (Apache License), so you can self-support in perpetuity if you wish.

4. For Apache __________ users, Storm utilizes the same ODBC interface.
a) cTakes
b) Hive
c) Pig
d) Oozie

View Answer

Answer: b [Reason:] You don’t have to worry about re-inventing the implementation wheel.

5. Point out the wrong statement :
a) Storm is difficult and can be used with only Java
b) Storm is fast: a benchmark clocked it at over a million tuples processed per second per node
c) Storm is scalable, fault-tolerant, guarantees your data will be processed
d) All of the mentioned

View Answer

Answer: a [Reason:] Storm is simple, can be used with any programming language.

6. Storm is benchmarked as processing one million _______ byte messages per second per node
a) 10
b) 50
c) 100
d) 200

View Answer

Answer: c [Reason:] Storm is a distributed realtime computation system.

7. Apache Storm added open source, stream data processing to _________ Data Platform
a) Cloudera
b) Hortonworks
c) Local Cloudera
d) MapR

View Answer

Answer: b [Reason:] The Storm community is working to improve capabilities related to three important themes: business continuity, operations and developer productivity.

8. How many types of nodes are present in Storm cluster ?
a) 1
b) 2
c) 3
d) 4

View Answer

Answer: c [Reason:] A storm cluster has three sets of nodes.

9. __________ node distributes code across the cluster.
a) Zookeeper
b) Nimbus
c) Supervisor
d) None of the mentioned

View Answer

Answer: b [Reason:] Nimbus node is master node,similar to the Hadoop JobTracker.

10. ____________ communicates with Nimbus through Zookeeper, starts and stops workers according to signals from Nimbus
a) Zookeeper
b) Nimbus
c) Supervisor
d) None of the mentioned

View Answer

Answer: c [Reason:] ZooKeeper nodes coordinates the Storm cluster.