Generic selectors
Exact matches only
Search in title
Search in content
Search in posts
Search in pages
Filter by Categories
nmims post
Objective Type Set
Online MCQ Assignment
Question Solution
Solved Question
Uncategorized

Hadoop MCQ Set 1

1. A ________ serves as the master and there is only one NameNode per cluster.
a) Data Node
b) NameNode
c) Data block
d) Replication

View Answer

Answer: b [Reason:] All the metadata related to HDFS including the information about data nodes, files stored on HDFS, and Replication, etc. are stored and maintained on the NameNode.

2. Point out the correct statement :
a) DataNode is the slave/worker node and holds the user data in the form of Data Blocks
b) Each incoming file is broken into 32 MB by default
c) Data blocks are replicated across different nodes in the cluster to ensure a low degree of fault tolerance
d) None of the mentioned

View Answer

Answer: a [Reason:] There can be any number of DataNodes in a Hadoop Cluster.

3. HDFS works in a __________ fashion.
a) master-worker
b) master-slave
c) worker/slave
d) all of the mentioned

View Answer

Answer: a [Reason:] NameNode servers as the master and each DataNode servers as a worker/slave

4. ________ NameNode is used when the Primary NameNode goes down.
a) Rack
b) Data
c) Secondary
d) None of the mentioned

View Answer

Answer: c [Reason:] Secondary namenode is used for all time availability and reliability.

5. Point out the wrong statement :
a) Replication Factor can be configured at a cluster level (Default is set to 3) and also at a file level
b) Block Report from each DataNode contains a list of all the blocks that are stored on that DataNode
c) User data is stored on the local file system of DataNodes
d) DataNode is aware of the files to which the blocks stored on it belong to

View Answer

Answer: d [Reason:] NameNode is aware of the files to which the blocks stored on it belong to.

6. Which of the following scenario may not be a good fit for HDFS ?
a) HDFS is not suitable for scenarios requiring multiple/simultaneous writes to the same file
b) HDFS is suitable for storing data related to applications requiring low latency data access
c) HDFS is suitable for storing data related to applications requiring low latency data access
d) None of the mentioned

View Answer

Answer: a [Reason:] HDFS can be used for storing archive data since it is cheaper as HDFS allows storing the data on low cost commodity hardware while ensuring a high degree of fault-tolerance.

7. The need for data replication can arise in various scenarios like :
a) Replication Factor is changed
b) DataNode goes down
c) Data Blocks get corrupted
d) All of the mentioned

View Answer

Answer: d [Reason:] Data is replicated across different DataNodes to ensure a high degree of fault-tolerance.

8. ________ is the slave/worker node and holds the user data in the form of Data Blocks.
a) DataNode
b) NameNode
c) Data block
d) Replication

View Answer

Answer: a [Reason:] A DataNode stores data in the [HadoopFileSystem]. A functional filesystem has more than one DataNode, with data replicated across them.

9. HDFS provides a command line interface called __________ used to interact with HDFS.
a) “HDFS Shell”
b) “FS Shell”
c) “DFS Shell”
d) None of the mentioned

View Answer

Answer: b [Reason:] The File System (FS) shell includes various shell-like commands that directly interact with the Hadoop Distributed File System (HDFS).

10. HDFS is implemented in _____________ programming language.
a) C++
b) Java
c) Scala
d) None of the mentioned

View Answer

Answer: b [Reason:] HDFS is implemented in Java and any computer which can run Java can host a NameNode/DataNode on it.

Hadoop MCQ Set 2

1. ________ job usually splits the input data-set into independent chunks which are processed by the map tasks in a completely parallel manner.
a) Tasker
b) MapReduce
c) Tasktrack
d) None of the mentioned

View Answer

Answer: b [Reason:] Hadoop MapReduce is a software framework for easily writing applications which process vast amounts of data.

2. Point out the correct statement :
a) Another limitation of the Hadoop MapReduce framework is its pull-based scheduling model
b) The MapReduce framework sorts the outputs of the maps, which are then input to the reduce tasks
c) The MapReduce framework takes care of scheduling tasks, monitoring them and re-executes the failed tasks
d) All of the mentioned

View Answer

Answer: d [Reason:] Typically both the input and the output of the job are stored in a file-system.

3. Hadoop __________ is a utility which allows users to create and run jobs with any executables
a) Streaming
b) Pipes
c) Orchestration
d) All of the mentioned

View Answer

Answer: a [Reason:] Applications specify the input/output locations and supply map and reduce functions.

4. Hadoop _________ is a SWIG- compatible C++ API to implement MapReduce applications
a) Streaming
b) Pipes
c) Orchestration
d) All of the mentioned

View Answer

Answer: b [Reason:] The MapReduce framework operates exclusively on pairs.

5. Point out the wrong statement :
a) MapReduce configuration allows the framework to effectively schedule tasks on the nodes where data is already present
b) Typically the compute nodes and the storage nodes are different
c) The MapReduce framework consists of a single master JobTracker and one slave TaskTracker per cluster-node
d) None of the mentioned

View Answer

Answer: b [Reason:] MapReduce framework and the Hadoop Distributed File System are running on the same set of nodes.

6. The key and value classes have to be _________ by the Mapreduce framework.
a) collected
b) serializable
c) compacted
d) none of the mentioned

View Answer

Answer: b [Reason:] Writable interface need to be implemented for key classes.

7. Key classes have to implement the __________ interface to facilitate sorting by the framework.
a) Writable
b) Comparable
c) WritableComparable
d) None of the mentioned

View Answer

Answer: c [Reason:] Input and Output types of a MapReduce job:(input) -> map -> -> combine -> -> reduce -> (output).

8. The ________ option allows applications to add jars to the classpaths of the maps and reduces.
a) optionname
b) -libjars
c) -archives
d) all of the mentioned

View Answer

Answer: b [Reason:] Applications can specify a comma separated list of paths which would be present in the current working directory of the task using the option -files.

9. The option ___________ allows to pass comma separated list of archives as arguments.
a) optionname
b) -libjars
c) -archives
d) none of the mentioned

View Answer

Answer: c [Reason:] These archives are unarchived and a link with name of the archive is created in the current working directory of tasks.

10. Users can specify a different symbolic name for files and archives passed through -files and -archives option, using :
a) $
b) @
c) #
d) $

View Answer

Answer: c [Reason:] MapReduce is the primary method for non-primary-key-based querying.

Hadoop MCQ Set 3

1. The split size is normally the size of an ________ block, which is appropriate for most applications.
a) Generic
b) Task
c) Library
d) HDFS

View Answer

Answer: d [Reason:] FileInputFormat splits only large files(Here “large” means larger than an HDFS block).

2. Point out the correct statement :
a) The minimum split size is usually 1 byte, although some formats have a lower bound on the split size
b) Applications may impose a minimum split size
c) The maximum split size defaults to the maximum value that can be represented by a Java long type
d) All of the mentioned

View Answer

Answer: a [Reason:] The maximum split size has an effect only when it is less than the block size, forcing splits to be smaller than a block.

3. Which of the following Hadoop streaming command option parameter is required ?
a) output directoryname
b) mapper executable
c) input directoryname
d) All of the mentioned

View Answer

Answer: d [Reason:] Required parameters is used for Input and Output location for mapper.

4. To set an environment variable in a streaming command use:
a) -cmden EXAMPLE_DIR=/home/example/dictionaries/
b) -cmdev EXAMPLE_DIR=/home/example/dictionaries/
c) -cmdenv EXAMPLE_DIR=/home/example/dictionaries/
d) -cmenv EXAMPLE_DIR=/home/example/dictionaries/

View Answer

Answer: c [Reason:] Environment Variable is set using cmdenv command.

5. Point out the wrong statement :
a) Hadoop works better with a small number of large files than a large number of small files
b) CombineFileInputFormat is designed to work well with small files
c) CombineFileInputFormat does not compromise the speed at which it can process the input in a typical MapReduce job
d) None of the mentioned

View Answer

Answer: c [Reason:] If the file is very small (“small” means significantly smaller than an HDFS block) and there are a lot of them, then each map task will process very little input, and there will be a lot of them (one per file), each of which imposes extra bookkeeping overhead.

6. The ________ option allows you to copy jars locally to the current working directory of tasks and automatically unjar the files.
a) archives
b) files
c) task
d) none of the mentioned

View Answer

Answer: a [Reason:] Archives options is also a generic option.

7. ______________ class allows the Map/Reduce framework to partition the map outputs based on certain key fields, not the whole keys.
a) KeyFieldPartitioner
b) KeyFieldBasedPartitioner
c) KeyFieldBased
d) None of the mentioned

View Answer

Answer: b [Reason:] The primary key is used for partitioning, and the combination of the primary and secondary keys is used for sorting.

8. Which of the following class provides a subset of features provided by the Unix/GNU Sort ?
a) KeyFieldBased
b) KeyFieldComparator
c) KeyFieldBasedComparator
d) All of the mentioned

View Answer

Answer: c [Reason:] Hadoop has a library class, KeyFieldBasedComparator, that is useful for many applications.

9. Which of the following class is provided by Aggregate package ?
a) Map
b) Reducer
c) Reduce
d) None of the mentioned

View Answer

Answer: b [Reason:] Aggregate provides a special reducer class and a special combiner class, and a list of simple aggregators that perform aggregations such as “sum”, “max”, “min” and so on over a sequence of values.

10.Hadoop has a library class, org.apache.hadoop.mapred.lib.FieldSelectionMapReduce, that effectively allows you to process text data like the unix ______ utility.
a) Copy
b) Cut
c) Paste
d) Move

View Answer

Answer: b [Reason:] The map function defined in the class treats each input key/value pair as a list of fields.

Hadoop MCQ Set 4

1. __________ storage is a solution to decouple growing storage capacity from compute capacity.
a) DataNode
b) Archival
c) Policy
d) None of the mentioned

View Answer

Answer: b [Reason:] Nodes with higher density and less expensive storage with low compute power are becoming available.

2. Point out the correct statement :
a) When there is enough space, block replicas are stored according to the storage type list
b) One_SSD is used for storing all replicas in SSD
c) Hot policy is useful only for single replica blocks
d) All of the mentioned

View Answer

Answer: a [Reason:] The first phase of Heterogeneous Storage changed datanode storage model from a single storage.

3. ___________ is added for supporting writing single replica files in memory.
a) ROM_DISK
b) ARCHIVE
c) RAM_DISK
d) All of the mentioned

View Answer

Answer: c [Reason:] DISK is the default storage type.

4. Which of the following has high storage density ?
a) ROM_DISK
b) ARCHIVE
c) RAM_DISK
d) All of the mentioned

View Answer

Answer: b [Reason:] Little compute power is added for supporting archival storage.

5. Point out the wrong statement :
a) A Storage policy consists of the Policy ID
b) The storage policy can be specified using the “dfsadmin -setStoragePolicy” command
c) dfs.storage.policy.enabled is used for enabling/disabling the storage policy feature
d) None of the mentioned

View Answer

Answer: d [Reason:] The effective storage policy can be retrieved by the “dfsadmin -getStoragePolicy” command.

6. Which of the following storage policy is used for both storage and compute ?
a) Hot
b) Cold
c) Warm
d) All_SSD

View Answer

Answer: a [Reason:] When a block is hot, all replicas are stored in DISK.

7. Which of the following is only for storage with limited compute ?
a) Hot
b) Cold
c) Warm
d) All_SSD

View Answer

Answer: b [Reason:] When a block is cold, all replicas are stored in ARCHIVE.

8. When a block is warm, some of its replicas are stored in DISK and the remaining replicas are stored in :
a) ROM_DISK
b) ARCHIVE
c) RAM_DISK
d) All of the mentioned

View Answer

Answer: b [Reason:] Warm storage policy is partially hot and partially cold.

9. ____________ is used for storing one of the replicas in SSD.
a) Hot
b) Lazy_Persist
c) One_SSD
d) All_SSD

View Answer

Answer: c [Reason:] The remaining replicas are stored in DISK.

10. ___________ is used for writing blocks with single replica in memory.
a) Hot
b) Lazy_Persist
c) One_SSD
d) All_SSD

View Answer

Answer: b [Reason:] The replica is first written in RAM_DISK and then it is lazily persisted in DISK.

Hadoop MCQ Set 5

1. Which of the following is java based tool for tracking, resolving and managing project dependencies ?
a) jclouds
b) JDO
c) ivy-
d) All of the mentioned

View Answer

Answer: c [Reason:] jclouds is a cloud agnostic library that enables developers to access a variety of supported cloud providers using one API.

2. Point out the correct statement :
a) Jena is Java framework for building Semantic Web applications
b) JSPWiki is Java-based wiki engine
c) jUDDI is implementation of a Universal Description Discovery and Integration (UDDI) registry
d) All of the mentioned

View Answer

Answer: d [Reason:] jUDDI is a web service.

3. Which of the following is Content Management and publishing system based on Cocoon ?
a) LibCloud
b) Kafka
c) Lenya
d) All of the mentioned

View Answer

Answer: c [Reason:] Kafka is a distributed publish-subscribe system for processing large amounts of streaming data.

4. __________ is used for Logging for .NET framework.
a) log4net
b) logphp
c) Lucene.NET
d) All of the mentioned

View Answer

Answer: c [Reason:] Lucene.NET is a source code, class-per-class, API-per-API and algorithmatic port of the Java.

5. Point out the wrong statement :
a) Lucy is loose port of the Lucene search engine library, written in C and targeted at dynamic language users
b) Manifold Connector Framework consist of connectors for content repositories like Sharepoint, Documentum, etc
c) Marmotta is an open implementation of a Linked Data Platform
d) None of the mentioned

View Answer

Answer: d [Reason:] Marmotta is incubator developed by Fabian Christ.

6. __________ is a cluster manager that provides resource sharing and isolation across cluster applications.
a) Merlin
b) Mesos
c) Max
d) Merge

View Answer

Answer: b [Reason:] Merlin eclipse plugin is merged with an existing eclipse plugin already at avalon.

7. Which of the following is a data access framework ?
a) Merge
b) Lucene.NET
c) MetaModel
d) None of the mentioned

View Answer

Answer: c [Reason:] MetaModel provides a common interface for exploration and querying of different types of datastores.

8. __________ is a library to support unit testing of Hadoop MapReduce jobs.
a) Myfaces
b) Muse
c) modftp
d) None of the mentioned

View Answer

Answer: d [Reason:] MRUnit is a library to support unit testing of Hadoop MapReduce jobs.

9. Which of the following is a robust implementation of the OASIS WSDM ?
a) Myfaces
b) Muse
c) modftp
d) None of the mentioned

View Answer

Answer: b [Reason:] Muse uses Web Services (MuWS) specification.

10. __________ is a framework for building Java Server application GUIs
a) Myfaces
b) Muse
c) Flume
d) BigTop

View Answer

Answer: a [Reason:] Myfaces is based on JavaServer Faces (certified implementation of JSR-127).