Generic selectors
Exact matches only
Search in title
Search in content
Search in posts
Search in pages
Filter by Categories
nmims post
Objective Type Set
Online MCQ Assignment
Question Solution
Solved Question
Uncategorized

Hadoop MCQ Set 1

1. The Mapper implementation processes one line at a time via _________ method.
a) map
b) reduce
c) mapper
d) reducer

View Answer

Answer: a [Reason:] The Mapper outputs are sorted and then partitioned per Reducer.

2. Point out the correct statement :
a) Mapper maps input key/value pairs to a set of intermediate key/value pairs
b) Applications typically implement the Mapper and Reducer interfaces to provide the map and reduce methods
c) Mapper and Reducer interfaces form the core of the job
d) None of the mentioned

View Answer

Answer: d [Reason:] The transformed intermediate records do not need to be of the same type as the input records.

3. The Hadoop MapReduce framework spawns one map task for each __________ generated by the InputFormat for the job.
a) OutputSplit
b) InputSplit
c) InputSplitStream
d) All of the mentioned

View Answer

Answer: b [Reason:] Mapper implementations are passed the JobConf for the job via the JobConfigurable.configure(JobConf) method and override it to initialize themselves.

4. Users can control which keys (and hence records) go to which Reducer by implementing a custom :
a) Partitioner
b) OutputSplit
c) Reporter
d) All of the mentioned

View Answer

Answer: a [Reason:] Users can control the grouping by specifying a Comparator via JobConf.setOutputKeyComparatorClass(Class).

5. Point out the wrong statement :
a) The Mapper outputs are sorted and then partitioned per Reducer
b) The total number of partitions is the same as the number of reduce tasks for the job
c) The intermediate, sorted outputs are always stored in a simple (key-len, key, value-len, value) format
d) None of the mentioned

View Answer

Answer: d [Reason:] All intermediate values associated with a given output key are subsequently grouped by the framework, and passed to the Reducer(s) to determine the final output.

6. Applications can use the ____________ to report progress and set application-level status messages
a) Partitioner
b) OutputSplit
c) Reporter
d) All of the mentioned

View Answer

Answer: c [Reason:] Reporter is also used to update Counters, or just indicate that they are alive.

7. The right level of parallelism for maps seems to be around _________ maps per-node
a) 1-10
b) 10-100
c) 100-150
d) 150-200

View Answer

Answer: b [Reason:] Task setup takes a while, so it is best if the maps take at least a minute to execute.

8. The number of reduces for the job is set by the user via :
a) JobConf.setNumTasks(int)
b) JobConf.setNumReduceTasks(int)
c) JobConf.setNumMapTasks(int)
d) All of the mentioned

View Answer

Answer: b [Reason:] Reducer has 3 primary phases: shuffle, sort and reduce.

9. The framework groups Reducer inputs by key in _________ stage.
a) sort
b) shuffle
c) reduce
d) none of the mentioned

View Answer

Answer: a [Reason:] The shuffle and sort phases occur simultaneously; while map-outputs are being fetched they are merged.

10. The output of the reduce task is typically written to the FileSystem via _____________
a) OutputCollector.collect
b) OutputCollector.get
c) OutputCollector.receive
d) OutputCollector.put

View Answer

Answer: a [Reason:] The output of the Reducer is not sorted.

Hadoop MCQ Set 2

1. You need to have _________ installed before running ZooKeeper.
a) Java
b) C
c) C++
d) SQLGUI

View Answer

Answer: a [Reason:] Client bindings are available in several other languages.

2. Point out the correct statement :
a) All znodes are ephemeral, which means they are describing a “temporary” state
b) /hbase/replication/state contains the list of RegionServers in the main cluster
c) Offline snapshots are coordinated by the Master using ZooKeeper to communicate with the RegionServers using a two-phase-commit-like transaction
d) None of the mentioned

View Answer

Answer: a [Reason:] Although the Replication znodes do not describe a temporary state, they are meant to be the source of truth for the replication state, describing the replication state of each machine.

3. How many types of special znodes are present in Zookeeper ?
a) 1
b) 2
c) 3
d) All of the mentioned

View Answer

Answer: b [Reason:] There are two special types of znode: sequential and ephemeral.

4. To register a “watch” on a znode’s data, you need to use the _______ commands to access the current content or metadata
a) stat
b) put
c) receive
d) gets

View Answer

Answer: a [Reason:] ZooKeeper can also notify you of changes in a znode’s content or changes in a znode’s children.

5. Point out the wrong statement :
a) All the znodes are prefixed using the default /hbase location
b) ZooKeeper provides an interactive shell that allows you to explore the ZooKeeper state
c) The znodes that you’ll most often see are the ones that coordinate operations like Region Assignment
d) All of the mentioned

View Answer

Answer: d [Reason:] The HBase root znode path is configurable using hbase-site.xml, and by default the location is “/hbase”.

6. _______ has a design policy of using ZooKeeper only for transient data
a) Hive
b) Imphala
c) Hbase
d) Oozie

View Answer

Answer: c [Reason:] If the HBase’s ZooKeeper data is removed, only the transient operations are affected – data can continue to be written and read to/from HBase.

7. Zookeeper keep track of the cluster state such as the ______ table location.
a) DOMAIN
b) NODE
c) ROOT
d) All of the mentioned

View Answer

Answer: c [Reason:] Zookeeper keeps track of list of online RegionServers,unassigned Regions.

8. The ________ master will register its own address in this znode at startup, making this znode the source of truth for identifying which server is the Master.
a) active
b) passive
c) region
d) all of the mentioned

View Answer

Answer: a [Reason:] Each inactive Master will register itself as backup Master by creating a sub-znode.

9. ___________ is used to decommission more than one RegionServer at a time by creating sub-znodes.
a) /hbase/master
b) /hbase/draining
c) /hbase/passive
d) None of the mentioned

View Answer

Answer: b [Reason:] /hbase/draining lets you decommission multiple RegionServers without having the risk of regions temporarily moved to a RegionServer that will be decommissioned later.

10. The ______ znode is used for synchronizing the changes made to the _acl_ table by the grant/revoke commands.
a) zcl
b) acl
c) scl
d) bnl

View Answer

Answer: b [Reason:] Each table will have a sub-znode (/hbase/acl/tableName) containing the ACLs of the table.

Hadoop MCQ Set 3

1. A ___________ node enables a workflow to make a selection on the execution path to follow.
a) fork
b) decision
c) start
d) none of the mentioned

View Answer

Answer: b [Reason:] All decision nodes must have a default element to avoid bringing the workflow into an error state if none of the predicates evaluates to true.

2. Point out the correct statement :
a) Predicates are JSP Expression Language (EL) expressions
b) Predicates are evaluated in order or appearance until one of them evaluates to true and the corresponding transition is taken
c) The name attribute in the decision node is the name of the decision node
d) All of the mentioned

View Answer

Answer: d [Reason:] The predicate ELs are evaluated in order until one returns true and the corresponding transition is taken.

3. Which of the following can be seen as a switch-case statement ?
a) fork
b) decision
c) start
d) none of the mentioned

View Answer

Answer: b [Reason:] A decision node consists of a list of predicates-transition pairs plus a default transition.

4. All decision nodes must have a _____________ element to avoid bringing the workflow into an error state if none of the predicates evaluates to true.
a) name
b) default
c) server
d) client

View Answer

Answer: b [Reason:] The default element indicates the transition to take if none of the predicates evaluates to true .

5. Point out the wrong statement :
a) The fork and join nodes must be used in pairs
b) The fork node assumes concurrent execution paths are children of the same fork node
c) A join node waits until every concurrent execution path of a previous fork node arrives to it
d) A fork node splits one path of execution into multiple concurrent paths of execution

View Answer

Answer: b [Reason:] The join node assumes concurrent execution paths are children of the same fork node.

6. The ___________ attribute in the join node is the name of the workflow join node.
a) name
b) to
c) down
d) none of the mentioned

View Answer

Answer: a [Reason:] The to attribute in the join node indicates the name of the workflow node that will executed after all concurrent execution paths of the corresponding fork arrive to the join node.

7. If a computation/processing task -triggered by a workflow- fails to complete successfully, its transitions to :
a) error
b) ok
c) true
d) false

View Answer

Answer: a [Reason:] If a computation/processing task -triggered by a workflow- completes successfully, it transitions to ok.

8. If the failure is of ___________ nature, Oozie will suspend the workflow job.
a) transient
b) non-transient
c) permanent
d) all of the mentioned

View Answer

Answer: b [Reason:] If the failure is an error and a retry will not resolve the problem, Oozie will perform the error transition for the action.

9. A _______________ action can be configured to perform file system cleanup and directory creation before starting the map reduce job.
a) map
b) reduce
c) map-reduce
d) none of the mentioned

View Answer

Answer: c [Reason:] The map-reduce action starts a Hadoop map/reduce job from a workflow.

10. ___________ properties can be overridden by specifying them in the job-xml file or configuration element.
a) Pipe
b) Decision
c) Flag
d) None of the mentioned

View Answer

Answer: a [Reason:] Pipes information can be specified in the pipes element.

Hadoop MCQ Set 4

1. Yarn commands are invoked by the ________ script.
a) hive
b) bin
c) hadoop
d) home

View Answer

Answer: b [Reason:] Running the yarn script without any arguments prints the description for all commands.

2. Point out the correct statement :
a) Each queue has strict ACLs which controls which users can submit applications to individual queues
b) Hierarchy of queues is supported to ensure resources are shared among the sub-queues of an organization
c) Queues are allocated a fraction of the capacity of the grid in the sense that a certain capacity of resources will be at their disposal
d) All of the mentioned

View Answer

Answer: d [Reason:] All applications submitted to a queue will have access to the capacity allocated to the queue.

3. The queue definitions and properties such as ________, ACLs can be changed, at runtime.
a) tolerant
b) capacity
c) speed
d) all of the mentioned

View Answer

Answer: b [Reason:] Administrators can add additional queues at runtime, but queues cannot be deleted at runtime.

4. The CapacityScheduler has a pre-defined queue called :
a) domain
b) root
c) rear
d) all of the mentioned

View Answer

Answer: b [Reason:] All queueus in the system are children of the root queue.

5. Point out the wrong statement :
a) The multiple of the queue capacity which can be configured to allow a single user to acquire more resources
b) Changing queue properties and adding new queues is very simple
c) Queues cannot be deleted, only addition of new queues is supported
d) None of the mentioned

View Answer

Answer: d [Reason:] You need to edit conf/capacity-scheduler.xml and run yarn rmadmin -refreshQueues for changing queue properties.

6. The updated queue configuration should be a valid one i.e. queue-capacity at each level should be equal to :
a) 50%
b) 75%
c) 100%
d) 0%

View Answer

Answer: c [Reason:] Queues cannot be deleted, only addition of new queues is supported.

7. Users can bundle their Yarn code in a _________ file and execute it using jar command.
a) java
b) jar
c) C code
d) xml

View Answer

Answer: b [Reason:] Usage: yarn jar [mainClass] args…

8. Which of the following command is used to dump the log container ?
a) logs
b) log
c) dump
d) all of the mentioned

View Answer

Answer: a [Reason:] Usage: yarn logs -applicationId .

9. __________ will clear the RMStateStore and is useful if past applications are no longer needed.
a) -format-state
b) -form-state-store
c) -format-state-store
d) none of the mentioned

View Answer

Answer: c [Reason:] -format-state-store formats the RMStateStore.

10. Which of the following command runs ResourceManager admin client ?
a) proxyserver
b) run
c) admin
d) rmadmin

View Answer

Answer: d [Reason:] proxyserver command starts the web proxy server.

Hadoop MCQ Set 5

1. The Amazon ____________ is a Web-based service that allows business subscribers to run application programs in the Amazon.com computing environment.
a) EC3
b) EC4
c) EMR
d) None of the mentioned

View Answer

Answer: d [Reason:] Use Amazon EC2 for scalable computing capacity in the AWS cloud so you can develop and deploy applications without hardware constraints.

2. Point out the correct statement :
a) Amazon Web Services offers reliable, scalable, and inexpensive cloud computing services
b) MongoDB runs well on Amazon EC2
c) To deploy MongoDB on EC2 you can either set up a new instance manually
d) All of the mentioned

View Answer

Answer: d [Reason:] MongoDB on EC2 can be deployed easily by using sharded cluster management.

3. Amazon ___________ is a Web service that provides real-time monitoring to Amazon’s EC2 customers.
a) AmWatch
b) CloudWatch
c) IamWatch
d) All of the mentioned

View Answer

Answer: b [Reason:] The current AMIs for all CoreOS channels and EC2 regions are updated frequently.

4. Amazon ___________ provides developers the tools to build failure resilient applications and isolate themselves from common failure scenarios.
a) EC2
b) EC3
c) EC4
d) All of the mentioned

View Answer

Answer: a [Reason:] Amazon EC2 changes the economics of computing by allowing you to pay only for capacity that you actually use.

5. Point out the wrong statement :
a) Amazon Elastic Compute Cloud (Amazon EC2) is a web service that provides resizable compute capacity in the cloud
b) Amazon EC2 is designed to make web-scale cloud computing easier for developers.
c) Amazon EC2’s simple web service interface allows you to obtain and configure capacity with minimal friction.
d) None of the mentioned

View Answer

Answer: d [Reason:] Amazon EC2 reduces the time required to obtain and boot new server instances to minutes.

6. Amazon EC2 provides virtual computing environments, known as :
a) chunks
b) instances
c) messages
d) none of the mentioned

View Answer

Answer: b [Reason:] Using Amazon EC2 eliminates your need to invest in hardware up front

7. Amazon ___________ is well suited to transfer bulk amount of data.
a) EC2
b) EC3
c) EC4
d) All of the mentioned

View Answer

Answer: b [Reason:] Amazon EC2 enables you to scale up or down to handle changes in requirements or spikes in popularity, reducing your need to forecast traffic.

8. The EC2 can serve as a practically unlimited set of ___________ machines.
a) virtual
b) real
c) distributed
d) all of the mentioned

View Answer

Answer: a [Reason:] To use the EC2, a subscriber creates an Amazon Machine Image (AMI) containing the operating system, application programs and configuration settings.

9. EC2 capacity can be increased or decreased in real time from as few as one to more than ___________ virtual machines simultaneously.
a) 1000
b) 2000
c) 3000
d) None of the mentioned

View Answer

Answer: a [Reason:] Billing takes place according to the computing and network resources consumed.

10. AMI is uploaded to the Amazon _______ and registered with Amazon EC2, creating a so-called AMI identifier (AMI ID).
a) S2
b) S3
c) S4
d) S5

View Answer

Answer: a [Reason:] Amazon S3 stands for Amazon Simple Storage Service.