Generic selectors
Exact matches only
Search in title
Search in content
Search in posts
Search in pages
Filter by Categories
nmims post
Objective Type Set
Online MCQ Assignment
Question Solution
Solved Question
Uncategorized

Hadoop MCQ Set 1

1. ___________ generates keys of type LongWritable and values of type Text.
a) TextOutputFormat
b) TextInputFormat
c) OutputInputFormat
d) None of the mentioned

View Answer

Answer: b [Reason:] If K2 and K3 are the same, you don’t need to call setMapOutputKeyClass().

2. Point out the correct statement :
a) The reduce input must have the same types as the map output, although the reduce output types may be different again
b) The map input key and value types (K1 and V1) are different from the map output types
c) The partition function operates on the intermediate key
d) All of the mentioned

View Answer

Answer: d [Reason:] In practice, the partition is determined solely by the key (the value is ignored).

3. In _____________, the default job is similar, but not identical, to the Java equivalent.
a) Mapreduce
b) Streaming
c) Orchestration
d) All of the mentioned

View Answer

Answer: b [Reason:] MapReduce Types and Formats MapReduce has a simple model of data processing.

4. An input _________ is a chunk of the input that is processed by a single map.
a) textformat
b) split
c) datanode
d) all of the mentioned

View Answer

Answer: b [Reason:] Each split is divided into records, and the map processes each record—a key-value pair—in turn.

5. Point out the wrong statement :
a) If V2 and V3 are the same, you only need to use setOutputValueClass()
b) The overall effect of Streaming job is to perform a sort of the input
c) A Streaming application can control the separator that is used when a key-value pair is turned into a series of bytes and sent to the map or reduce process over standard input
d) None of the mentioned

View Answer

Answer: d [Reason:] If a combine function is used then it is the same form as the reduce function, except its output types are the intermediate key and value types (K2 and V2), so they can feed the reduce function.

6. An ___________ is responsible for creating the input splits, and dividing them into records.
a) TextOutputFormat
b) TextInputFormat
c) OutputInputFormat
d) InputFormat

View Answer

Answer: d [Reason:] As a MapReduce application writer, you don’t need to deal with InputSplits directly, as they are created by an InputFormat.

7. ______________ is another implementation of the MapRunnable interface that runs mappers concurrently in a configurable number of threads.
a) MultithreadedRunner
b) MultithreadedMap
c) MultithreadedMapRunner
d) SinglethreadedMapRunner

View Answer

Answer: c [Reason:] A RecordReader is little more than an iterator over records, and the map task uses one to generate record key-value pairs, which it passes to the map function.

8. Which of the following is the only way of running mappers ?
a) MapReducer
b) MapRunner
c) MapRed
d) All of the mentioned

View Answer

Answer: b [Reason:] Having calculated the splits, the client sends them to the jobtracker.

9. _________ is the base class for all implementations of InputFormat that use files as their data source .
a) FileTextFormat
b) FileInputFormat
c) FileOutputFormat
d) None of the mentioned

View Answer

Answer: b [Reason:] FileInputFormat provides implementation for generating splits for the input files.

10. Which of the following method add a path or paths to the list of inputs ?
a) setInputPaths()
b) addInputPath()
c) setInput()
d) none of the mentioned

View Answer

Answer: b [Reason:] FileInputFormat offers four static convenience methods for setting a JobConf’s input paths.

Hadoop MCQ Set 2

1. Apache Flume 1.3.0 is the fourth release under the auspices of Apache of the so-called ________ codeline
a) NG
b) ND
c) NF
d) NR

View Answer

Answer: a [Reason:] Flume 1.3.0 has been put through many stress and regression tests, is stable, production-ready software, and is backwards-compatible with Flume 1.2.0.

2. Point out the correct statement :
a) Flume is a distributed, reliable, and available service
b) Version 1.5.2 is the eighth Flume release as an Apache top-level project
c) Flume 1.5.2 is production-ready software for integration with hadoop
d) All of the mentioned

View Answer

Answer: a [Reason:] Flume is used for efficiently collecting, aggregating, and moving large amounts of streaming event data.

3. ___________ was created to allow you to flow data from a source into your Hadoop environment.
a) Imphala
b) Oozie
c) Flume
d) All of the mentioned

View Answer

Answer: c [Reason:] In Flume, the entities you work with are called sources, decorators, and sinks.

4. A ____________ is an operation on the stream that can transform the stream.
a) Decorator
b) Source
c) Sinks
d) All of the mentioned

View Answer

Answer: b [Reason:] A source can be any data source, and Flume has many predefined source adapters.

5. Point out the wrong statement :
a) Version 1.4.0 is the fourth Flume release as an Apache top-level project
b) Apache Flume 1.5.2 is a security and maintenance release that disables SSLv3 on all components in Flume that support SSL/TLS
c) Flume is backwards-compatible with previous versions of the Flume 1.x codeline
d) None of the mentioned

View Answer

Answer: d [Reason:] Apache Flume 1.3.1 is a maintenance release for the 1.3.0 release, and includes several bug fixes and performance enhancements.

6. A number of ____________ source adapters give you the granular control to grab a specific file.
a) multimedia file
b) text file
c) image file
d) none of the mentioned

View Answer

Answer: b [Reason:] A number of predefined source adapters are built into Flume.

7. ____________ is used when you want the sink to be the input source for another operation.
a) Collector Tier Event
b) Agent Tier Event –
c) Basic
d) All of the mentioned

View Answer

Answer: b [Reason:] All agents in a specific tier could be given the same name; One configuration file with … Clients send Events to Agents; Agents hosts number Flume components.

8. ___________ is where you would land a flow (or possibly multiple flows joined together) into an HDFS-formatted file system.
a) Collector Tier Event
b) Agent Tier Event –
c) Basic
d) All of the mentioned

View Answer

Answer: a [Reason:] A number of other predefined source adapters, as well as a command exit, allow you to use any executable command to feed the flow of data.

9. ____________ sink can be a text file, the console display, a simple HDFS path, or a null bucket where the data is simply deleted.
a) Collector Tier Event
b) Agent Tier Event –
c) Basic
d) None of the mentioned

View Answer

Answer: c [Reason:] Flume will also ensure the integrity of the flow by sending back acknowledgments that data has actually arrived at the sink.

10. Flume deploys as one or more agents, each contained within its own instance of :
a) JVM
b) Channels
c) Chunks
d) None of the mentioned

View Answer

Answer: a [Reason:] An agent must have at least one of each in order to run.

Hadoop MCQ Set 3

1. Which of the following is the default Partitioner for Mapreduce ?
a) MergePartitioner
b) HashedPartitioner
c) HashPartitioner
d) None of the mentioned

View Answer

Answer: c [Reason:] The total number of partitions is the same as the number of reduce tasks for the job.

2. Point out the correct statement :
a) The right number of reduces seems to be 0.95 or 1.75
b) Increasing the number of reduces increases the framework overhead
c) With 0.95 all of the reduces can launch immediately and start transferring map outputs as the maps finish
d) All of the mentioned

View Answer

Answer: c [Reason:] With 1.75 the faster nodes will finish their first round of reduces and launch a second wave of reduces doing a much better job of load balancing.

3. Which of the following partitions the key space ?
a) Partitioner
b) Compactor
c) Collector
d) All of the mentioned

View Answer

Answer: a [Reason:] Partitioner controls the partitioning of the keys of the intermediate map-outputs.

4. ____________ is a generalization of the facility provided by the MapReduce framework to collect data output by the Mapper or the Reducer
a) OutputCompactor
b) OutputCollector
c) InputCollector
d) All of the mentioned

View Answer

Answer: b [Reason:] Hadoop MapReduce comes bundled with a library of generally useful mappers, reducers, and partitioners.

5. Point out the wrong statement :
a) It is legal to set the number of reduce-tasks to zero if no reduction is desired
b) The outputs of the map-tasks go directly to the FileSystem
c) The Mapreduce framework does not sort the map-outputs before writing them out to the FileSystem
d) None of the mentioned

View Answer

Answer: d [Reason:] Outputs of the map-tasks go directly to the FileSystem, into the output path set by setOutputPath(Path).

6. __________ is the primary interface for a user to describe a MapReduce job to the Hadoop framework for execution.
a) JobConfig
b) JobConf
c) JobConfiguration
d) All of the mentioned

View Answer

Answer: b [Reason:] JobConf is typically used to specify the Mapper, combiner (if any), Partitioner, Reducer, InputFormat, OutputFormat and OutputCommitter implementations.

7. The ___________ executes the Mapper/ Reducer task as a child process in a separate jvm.
a) JobTracker
b) TaskTracker
c) TaskScheduler
d) None of the mentioned

View Answer

Answer: a [Reason:] The child-task inherits the environment of the parent TaskTracker.

8. Maximum virtual memory of the launched child-task is specified using :
a) mapv
b) mapred
c) mapvim
d) All of the mentioned

View Answer

Answer: b [Reason:] Admins can also specify the maximum virtual memory of the launched child-task, and any sub-process it launches recursively, using mapred.

9. Which of the following parameter is the threshold for the accounting and serialization buffers ?
a) io.sort.spill.percent
b) io.sort.record.percent
c) io.sort.mb
d) None of the mentioned

View Answer

Answer: a [Reason:] When percentage of either buffer has filled, their contents will be spilled to disk in the background.

10. ______________ is percentage of memory relative to the maximum heapsize in which map outputs may be retained during the reduce.
a) mapred.job.shuffle.merge.percent
b) mapred.job.reduce.input.buffer.percen
c) mapred.inmem.merge.threshold
d) io.sort.factor

View Answer

Answer: b [Reason:] When the reduce begins, map outputs will be merged to disk until those that remain are under the resource limit this defines.

Hadoop MCQ Set 4

1. _________ is the name of the archive you would like to create.
a) archive
b) archiveName
c) name
d) none of the mentioned

View Answer

Answer: b [Reason:] The name should have a *.har extension.

2. Point out the correct statement :
a) A Hadoop archive maps to a file system directory
b) Hadoop archives are special format archives
c) A Hadoop archive always has a *.har extension
d) All of the mentioned

View Answer

Answer: d [Reason:] A Hadoop archive directory contains metadata (in the form of _index and _masterindex) and data (part-*) files.

3. Using Hadoop Archives in __________ is as easy as specifying a different input filesystem than the default file system.
a) Hive
b) Pig
c) MapReduce
d) All of the mentioned

View Answer

Answer: c [Reason:] Hadoop Archives is exposed as a file system MapReduce will be able to use all the logical input files in Hadoop Archives as input.

4. The __________ guarantees that excess resources taken from a queue will be restored to it within N minutes of its need for them.
a) capacitor
b) scheduler
c) datanode
d) none of the mentioned

View Answer

Answer: b [Reason:] Free resources can be allocated to any queue beyond its guaranteed capacity.

5. Point out the wrong statement :
a) The Hadoop archive exposes itself as a file system layer
b) Hadoop archives are immutable
c) Archive rename’s, deletes and creates return an error
d) None of the mentioned

View Answer

Answer: d [Reason:] All the fs shell commands in the archives work but with a different URI.

6. _________ is a pluggable Map/Reduce scheduler for Hadoop which provides a way to share large clusters.
a) Flow Scheduler
b) Data Scheduler
c) Capacity Scheduler
d) None of the mentioned

View Answer

Answer: c [Reason:] The Capacity Scheduler supports for multiple queues, where a job is submitted to a queue.

7. Which of the following parameter describes destination directory which would contain the archive ?
a) -archiveName
b) <source>
c) <destination>
d) none of the mentioned

View Answer

Answer: c [Reason:] -archiveName is the name of the archive to be created.

8. _________ identifies filesystem pathnames which work as usual with regular expressions.
a) -archiveName <name>
b) <source>
c) <destination>
d) none of the mentioned

View Answer

Answer: d [Reason:] identifies destination directory which would contain the archive.

9. __________ is the parent argument used to specify the relative path to which the files should be archived to
a) -archiveName <name>
b) -p <parent_path>
c) <destination>
d) <source>

View Answer

Answer: b [Reason:] The hadoop archive command creates a Hadoop archive, a file that contains other files.

10. Which of the following is a valid syntax for hadoop archive ?
a)

 hadooparchive [ Generic Options ] archive
    -archiveName <name>
    [-p <parent>]
    <source>
    <destination>

b)

 hadooparch [ Generic Options ] archive
    -archiveName <name>
    [-p <parent>]
    <source>
    <destination>

c)

 hadoop [ Generic Options ] archive
    -archiveName <name>
    [-p <parent>]
    <source>
    <destination>

d) None of the mentioned

View Answer

Answer: c [Reason:] The Hadoop archiving tool can be invoked using the following command format: hadoop archive -archiveName name -p * .

Hadoop MCQ Set 5

1. Which of the following class provides access to configuration parameters ?
a) Config
b) Configuration
c) OutputConfig
d) None of the mentioned

View Answer

Answer: b [Reason:] Configurations are specified by resources.

2. Point out the correct statement :
a) Configuration parameters may be declared static
b) Unless explicitly turned off, Hadoop by default specifies two resources
c) Configuration class provides access to configuration parameters
d) None of the mentioned

View Answer

Answer: a [Reason:] Once a resource declares a value final, no subsequently-loaded resource can alter that value.

3. ___________ gives site-specific configuration for a given hadoop installation.
a) core-default.xml
b) core-site.xml
c) coredefault.xml
d) all of the mentioned

View Answer

Answer: b [Reason:] core-default.xml is read-only defaults for hadoop.

4. Administrators typically define parameters as final in __________ for values that user applications may not alter.
a) core-default.xml
b) core-site.xml
c) coredefault.xml
d) all of the mentioned

View Answer

Answer: b [Reason:] Value strings are first processed for variable expansion.

5. Point out the wrong statement :
a) addDeprecations adds a set of deprecated keys to the global deprecations
b) configuration parameters cannot be declared final
c) addDeprecations method is lockless
d) none of the mentioned

View Answer

Answer: b [Reason:] Configuration parameters may be declared final.

6. _________ method clears all keys from the configuration.
a) clear
b) addResource
c) getClass
d) none of the mentioned

View Answer

Answer: a [Reason:] getClass is used to get the value of the name property as a Class.

7. ________ method adds the deprecated key to the global deprecation map.
a) addDeprecits
b) addDeprecation
c) keyDeprecation
d) none of the mentioned

View Answer

Answer: b [Reason:] addDeprecation does not override any existing entries in the deprecation map.

8. ________ checks whether the given key is deprecated.
a) isDeprecated
b) setDeprecated
c) isDeprecatedif
d) all of the mentioned

View Answer

Answer: a [Reason:] Method returns true if the key is deprecated and false otherwise.

9. _________ is useful for iterating the properties when all deprecated properties for currently set properties need to be present.
a) addResource
b) setDeprecatedProperties
c) addDefaultResource
d) none of the mentioned

View Answer

Answer: b [Reason:] setDeprecatedProperties sets all deprecated properties that are not currently set but have a corresponding new property that is set.

10. Which of the following adds a configuration resource ?
a) addResource
b) setDeprecatedProperties
c) addDefaultResource
d) addResource

View Answer

Answer: d [Reason:] The properties of this resource will override properties of previously added resources, unless they were marked final.