Generic selectors
Exact matches only
Search in title
Search in content
Search in posts
Search in pages
Filter by Categories
nmims post
Objective Type Set
Online MCQ Assignment
Question Solution
Solved Question
Uncategorized

Hadoop MCQ Set 1

1. _________ is a data migration tool added for archiving data.
a) Mover
b) Hiver
c) Serde
d) None of the mentioned

View Answer

Answer: a [Reason:] Mover periodically scans the files in HDFS to check if the block placement satisfies the storage policy.

2. Point out the correct statement :
a) Mover is not similar to Balancer
b) hdfs dfsadmin -setStoragePolicy puts a storage policy to a file or a directory.
c) addCacheArchive add archives to be localized
d) none of the mentioned

View Answer

Answer: c [Reason:] addArchiveToClassPath(Path archive) adds an archive path to the current set of classpath entries.

3. Which of the following is used to list out the storage policies ?
a) hdfs storagepolicies
b) hdfs storage
c) hd storagepolicies
d) all of the mentioned

View Answer

Answer: a [Reason:] Arguments are none for the hdfs storagepolicies command.

4. Which of the following statement can be used get the storage policy of a file or a directory ?
a) hdfs dfsadmin -getStoragePolicy path
b) hdfs dfsadmin -setStoragePolicy path policyName
c) hdfs dfsadmin -listStoragePolicy path policyName
d) all of the mentioned

View Answer

Answer: a [Reason:] refers to the path referring to either a directory or a file.

5. Point out the wrong statement :
a) getInstance() creates a new Job with particular cluster
b) getInstance(Configuration conf) creates a new Job with no particular Cluster and a given Configuration
c) getInstance(JobStatus status, Configuration conf) creates a new Job with no particular Cluster and given Configuration and JobStatus.
d) all of the mentioned

View Answer

Answer: a [Reason:] getInstance() creates a new Job with particular cluster.

6. Which of the following method is used to get user-specified job name ?
a) getJobName()
b) getJobState()
c) getPriority()
d) all of the mentioned

View Answer

Answer: a [Reason:] getPriority() is used to get scheduling info of the job.

7. __________ get events indicating completion (success/failure) of component tasks.
a) getJobName()
b) getJobState()
c) getPriority()
d) getTaskCompletionEvents(int startFrom)

View Answer

Answer: d [Reason:] getPriority() provides scheduling info of the job.

8. _________ gets the diagnostic messages for a given task attempt.
a) getTaskOutputFilter(Configuration conf)
b) getTaskReports(TaskType type)
c) getTrackingURL()
d) all of the mentioned

View Answer

Answer: a [Reason:] getTaskDiagnostics(TaskAttemptID taskid) gets the diagnostic messages for a given task attempt.

9. reduceProgress() gets the progress of the job’s reduce-tasks, as a float between :
a) 0.0-1.0
b) 1.0-2.0
c) 2.0-3.0
d) None of the mentioned

View Answer

Answer: a [Reason:] mapProgress() is used to get the progress of the job’s map-tasks, as a float between 0.0 and 1.0.

10. The Job makes a copy of the _____________ so that any necessary internal modifications do not reflect on the incoming parameter.
a) Component
b) Configuration
c) Collector
d) None of the mentioned

View Answer

Answer: b [Reason:] A Cluster will be created from the conf parameter only when it’s needed.

Hadoop MCQ Set 2

1. For YARN, the ___________ Manager UI provides host and port information.
a) Data Node
b) NameNode
c) Resource
d) Replication

View Answer

Answer: c [Reason:] All the metadata related to HDFS including the information about data nodes, files stored on HDFS, and Replication, etc. are stored and maintained on the NameNode.

2. Point out the correct statement :
a) The Hadoop framework publishes the job flow status to an internally running web server on the master nodes of the Hadoop cluster
b) Each incoming file is broken into 32 MB by default
c) Data blocks are replicated across different nodes in the cluster to ensure a low degree of fault tolerance
d) None of the mentioned

View Answer

Answer: a [Reason:] The web interface for the Hadoop Distributed File System (HDFS) shows information about the NameNode itself.

3. For ________, the HBase Master UI provides information about the HBase Master uptime.
a) HBase
b) Oozie
c) Kafka
d) All of the mentioned

View Answer

Answer: a [Reason:] HBase Master UI provides information about the num­ber of live, dead and transitional servers, logs, ZooKeeper information, debug dumps, and thread stacks.

4. ________ NameNode is used when the Primary NameNode goes down.
a) Rack
b) Data
c) Secondary
d) None of the mentioned

View Answer

Answer: c [Reason:] Secondary namenode is used for all time availability and reliability.

5. Point out the wrong statement :
a) Replication Factor can be configured at a cluster level (Default is set to 3) and also at a file level
b) Block Report from each DataNode contains a list of all the blocks that are stored on that DataNode
c) User data is stored on the local file system of DataNodes
d) DataNode is aware of the files to which the blocks stored on it belong to

View Answer

Answer: d [Reason:] NameNode is aware of the files to which the blocks stored on it belong to.

6. Which of the following scenario may not be a good fit for HDFS ?
a) HDFS is not suitable for scenarios requiring multiple/simultaneous writes to the same file
b) HDFS is suitable for storing data related to applications requiring low latency data access
c) HDFS is suitable for storing data related to applications requiring low latency data access
d) None of the mentioned

View Answer

Answer: a [Reason:] HDFS can be used for storing archive data since it is cheaper as HDFS allows storing the data on low cost commodity hardware while ensuring a high degree of fault-tolerance.

7. The need for data replication can arise in various scenarios like :
a) Replication Factor is changed
b) DataNode goes down
c) Data Blocks get corrupted
d) All of the mentioned

View Answer

Answer: d [Reason:] Data is replicated across different DataNodes to ensure a high degree of fault-tolerance.

8. ________ is the slave/worker node and holds the user data in the form of Data Blocks.
a) DataNode
b) NameNode
c) Data block
d) Replication

View Answer

Answer: a [Reason:] A DataNode stores data in the [HadoopFileSystem]. A functional filesystem has more than one DataNode, with data replicated across them.

9. HDFS provides a command line interface called __________ used to interact with HDFS.
a) “HDFS Shell”
b) “FS Shell”
c) “DFS Shell”
d) None of the mentioned

View Answer

Answer: b [Reason:] The File System (FS) shell includes various shell-like commands that directly interact with the Hadoop Distributed File System.

10. During start up, the ___________ loads the file system state from the fsimage and the edits log file.
a) DataNode
b) NameNode
c) ActionNode
d) None of the mentioned

View Answer

Answer: b [Reason:] HDFS is implemented on any computer which can run Java can host a NameNode/DataNode on it.

Hadoop MCQ Set 3

1. __________ is a server based Bundle Engine that provides a higher-level oozie abstraction that will batch a set of coordinator applications.
a) Oozie v2
b) Oozie v3
c) Oozie v4
d) Oozie v5

View Answer

Answer: c [Reason:] Oozie combines multiple jobs sequentially into one logical unit of work.

2. Point out the correct statement :
a) Oozie is a scalable, reliable and extensible system
b) Oozie is a server-based Workflow Engine specialized in running workflow jobs
c) Oozie Coordinator jobs are recurrent Oozie Workflow jobs triggered by time (frequency) and data availabilty
d) All of the mentioned

View Answer

Answer: d [Reason:] Jobs include streaming as well as system specific jobs.

3. ___________ is a Java Web application used to schedule Apache Hadoop jobs
a) Impala
b) Oozie
c) Mahout
d) All of the mentioned

View Answer

Answer: b [Reason:] Oozie is a workflow scheduler system to manage Hadoop jobs.

4. Oozie Workflow jobs are Directed ________ graphs of actions.
a) Acyclical
b) Cyclical
c) Elliptical
d) All of the mentioned

View Answer

Answer: a [Reason:] Oozie is a framework allowing to combine multiple Map/Reduce jobs into a logical unit of work.

5. Point out the wrong statement :
a) Oozie v2 is a server based Coordinator Engine specialized in running workflows based on time and data triggers
b) Oozie v1 is a server based Workflow Engine specialized in running workflow jobs with actions that execute Hadoop Map/Reduce and Pig jobs
c) A Workflow application is DAG that coordinates the following types of actions
d) None of the mentioned

View Answer

Answer: d [Reason:] Cycle in workflows are not supported.

6. Oozie v2 is a server based ___________ Engine specialized in running workflows based on time and data triggers
a) Compactor
b) Collector
c) Coordinator
d) All of the mentioned

View Answer

Answer: c [Reason:] Oozie v2 can continuously run workflows based on time and data.

7. Which of the following is one of the possible state for a workflow jobs ?
a) PREP
b) START
c) RESUME
d) END

View Answer

Answer: a [Reason:] Possible states for a workflow jobs are: PREP , RUNNING , SUSPENDED , SUCCEEDED , KILLED and FAILED.

8. Oozie can make _________ callback notifications on action start events and workflow end events.
a) TCP
b) HTTP
c) IP
d) All of the mentioned

View Answer

Answer: b [Reason:] In the case of a action start failure in a workflow job, depending on the type of failure, Oozie will attempt automatic retries, it will request a manual retry or it will fail the workflow job.

9. A workflow definition is a ______ with control flow nodes or action nodes.
a) CAG
b) DAG
c) BAG
d) None of the mentioned

View Answer

Answer: b [Reason:] Oozie does not support cycles in workflow definitions, workflow definitions must be a strict DAG.

10. Which of the following workflow definition language is XML based ?
a) hpDL
b) hDL
c) hiDL
d) none of the mentioned

View Answer

Answer: a [Reason:] hpDL stands for Hadoop Process Definition Language.

Hadoop MCQ Set 4

1. A collection of various actions in a control dependency DAG is referred to as :
a) workflow
b) dataflow
c) clientflow
d) none of the mentioned

View Answer

Answer: a [Reason:] Falcon provides the key services for data processing apps.

2. Point out the correct statement :
a) Large datasets are incentives for users to come to Hadoop
b) Data management is a common concern to be offered as a service
c) Understanding the life-time of a feed will allow for implicit validation of the processing rules
d) All of the mentioned

View Answer

Answer: d [Reason:] Falcon decouples a data location and its properties from workflows.

3. The ability of Hadoop to efficiently process large volumes of data in parallel is called __________ processing.
a) batch
b) stream
c) time
d) all of the mentioned

View Answer

Answer: b [Reason:] There are also a number of use cases that require more “real-time” processing of data—processing the data as it arrives, rather than through batch processing.

4. __________ is used for simplified Data Management in Hadoop.
a) Falcon
b) flume
c) Impala
d) none of the mentioned

View Answer

Answer: a [Reason:] Apache Falcon process orchestration and scheduling.

5. Point out the wrong statement :
a) Falcon promotes Javascript Programming
b) Falcon does not do any heavy lifting but delegates to tools with in the Hadoop ecosystem
c) Falcon handles retry logic and late data processing. Records audit, lineage and metrics
d) All of the mentioned

View Answer

Answer: a [Reason:] Falcon promotes Polyglot Programming.

6. Falcon provides ___________ workflow for copying data from source to target.
a) recurring
b) investment
c) data
d) none of the mentioned

View Answer

Answer: a [Reason:] Falcon instruments workflows for dependencies, retry logic, Table/Partition registration, notifications, etc.

7. A recurring workflow is used for purging expired data on __________ cluster.
a) Primary
b) Secondary
c) BCP
d) None of the mentioned

View Answer

Answer: a [Reason:] Falcon provides retention workflow for each cluster based on the defined policy.

8. Falconm provides the key services data processing applications need so Sophisticated________ can easily be added to Hadoop applications.
a) DAM
b) DLM
c) DCM
d) All of the mentioned

View Answer

Answer: b [Reason:] Complex data processing logic is handled by Falcon instead of hard-coded in apps.

9. Falcon promotes decoupling of data set location from ___________ definition.
a) Oozie
b) Impala
c) Kafka
d) Thrift

View Answer

Answer: a [Reason:] Falcon uses declarative processing with simple directives enabling rapid prototyping.

10. Falcon provides seamless integration with :
a) HCatalog
b) metastore
c) HBase
d) Kafka

View Answer

Answer: b [Reason:] Falcon maintains the dependencies and relationships between entities.

Hadoop MCQ Set 5

1._________ operator is used to review the schema of a relation.
a) DUMP
b) DESCRIBE
c) STORE
d) EXPLAIN

View Answer

Answer: b [Reason:] DESCRIBE returns the schema of a relation.

2. Point out the correct statement :
a) During the testing phase of your implementation, you can use LOAD to display results to your terminal screen
b) You can view outer relations as well as relations defined in a nested FOREACH statement
c) Hadoop properties are interpreted by Pig
d) None of the mentioned

View Answer

Answer: b [Reason:] Viewing outer relations is possible using DESCRIBE operator.

3. Which of the following operator is used to view the map reduce execution plans ?
a) DUMP
b) DESCRIBE
c) STORE
d) EXPLAIN

View Answer

Answer: d [Reason:] EXPLAIN displays execution plans.

4. ___________ operator is used to view the step-by-step execution of a series of statements.
a) ILLUSTRATE
b) DESCRIBE
c) STORE
d) EXPLAIN

View Answer

Answer: a [Reason:] ILLUSTRATE allows you to test your programs on small datasets and get faster turnaround times.

5. Point out the wrong statement :
a) ILLUSTRATE operator is used to review how data is transformed through a sequence of Pig Latin statements
b) ILLUSTRATE is based on an example generator
c) Several new private classes make it harder for external tools such as Oozie to integrate with Pig statistics
d) None of the mentioned

View Answer

Answer: c [Reason:] Several new public classes make it easier for external tools such as Oozie to integrate with Pig statistics.

6. __________ is a framework for collecting and storing script-level statistics for Pig Latin.
a) Pig Stats
b) PStatistics
c) Pig Statistics
d) None of the mentioned

View Answer

Answer: c [Reason:] The new Pig statistics and the existing Hadoop statistics can also be accessed via the Hadoop job history file.

7. The ________ class mimics the behavior of the Main class but gives users a statistics object back.
a) PigRun
b) PigRunner
c) RunnerPig
d) None of the mentioned

View Answer

Answer: b [Reason:] Optionally, you can call the API with an implementation of progress listener which will be invoked by Pig runtime during the execution.

8. ___________ is a simple xUnit framework that enables you to easily test your Pig scripts.
a) PigUnit
b) PigXUnit
c) PigUnitX
d) All of the mentioned

View Answer

Answer: b [Reason:] With PigUnit you can perform unit testing, regression testing, and rapid prototyping. No cluster set up is required if you run Pig in local mode.

9. Which of the following will compile the Pigunit ?
a) $pig_trunk ant pigunit-jar
b) $pig_tr ant pigunit-jar
c) $pig_ ant pigunit-jar
d) None of the mentioned

View Answer

Answer: a [Reason:] The compile will create the pigunit.jar file.

10. PigUnit runs in Pig’s _______ mode by default.
a) local
b) tez
c) mapreduce
d) none of the mentioned

View Answer

Answer: a [Reason:] Local mode does not require a real cluster but a new local one is created each time.