Generic selectors
Exact matches only
Search in title
Search in content
Search in posts
Search in pages
Filter by Categories
nmims post
Objective Type Set
Online MCQ Assignment
Question Solution
Solved Question
Uncategorized

Hadoop MCQ Set 1

1. The _________ codec from Google provides modest compression ratios.
a) Snapcheck
b) Snappy
c) FileCompress
d) None of the mentioned

View Answer

Answer: b [Reason:] Snappy has fast compression and decompression speeds.

2. Point out the correct statement :
a) Snappy is licensed under the GNU Public License (GPL)
b) BgCIK needs to create an index when it compresses a file
c) The Snappy codec is integrated into Hadoop Common, a set of common utilities that supports other Hadoop subprojects
d) None of the mentioned

View Answer

Answer: c [Reason:] You can use Snappy as an add-on for more recent versions of Hadoop that do not yet provide Snappy codec support.

3. Which of the following compression is similar to Snappy compression ?
a) LZO
b) Bzip2
c) Gzip
d) All of the mentioned

View Answer

Answer: a [Reason:] LZO is only really desirable if you need to compress text files.

4. Which of the following supports splittable compression ?
a) LZO
b) Bzip2
c) Gzip
d) All of the mentioned

View Answer

Answer: a [Reason:] LZO enables the parallel processing of compressed text file splits by your MapReduce jobs.

5. Point out the wrong statement :
a) From a usability standpoint, LZO and Gzip are similar.
b) Bzip2 generates a better compression ratio than does Gzip, but it’s much slower
c) Gzip is a compression utility that was adopted by the GNU project
d) None of the mentioned

View Answer

Answer: a [Reason:] From a usability standpoint, Bzip2 and Gzip are similar.

6. Which of the following is the slowest compression technique ?
a) LZO
b) Bzip2
c) Gzip
d) All of the mentioned

View Answer

Answer: b [Reason:] Of all the available compression codecs in Hadoop, Bzip2 is by far the slowest.

7. Gzip (short for GNU zip) generates compressed files that have a _________ extension.
a) .gzip
b) .gz
c) .gzp
d) .g

View Answer

Answer: b [Reason:] You can use the gunzip command to decompress files that were created by a number of compression utilities, including Gzip.

8. Which of the following is based on the DEFLATE algorithm ?
a) LZO
b) Bzip2
c) Gzip
d) All of the mentioned

View Answer

Answer: c [Reason:] gzip is based on the DEFLATE algorithm, which is a combination of LZ77 and Huffman Coding.

9. __________ typically compresses files to within 10% to 15% of the best available techniques.
a) LZO
b) Bzip2
c) Gzip
d) All of the mentioned

View Answer

Answer: b [Reason:] bzip2 is a freely available, patent free (see below), high-quality data compressor.

10. The LZO compression format is composed of approximately __________ blocks of compressed data.
a) 128k
b) 256k
c) 24k
d) 36k

View Answer

Answer: b [Reason:] LZO was designed with speed in mind: it decompresses about twice as fast as gzip, meaning it’s fast enough to keep up with hard drive read speeds.

Hadoop MCQ Set 2

1. The Apache Crunch Java library provides a framework for writing, testing, and running ___________ pipelines.
a) MapReduce
b) Pig
c) Hive
d) None of the mentioned

View Answer

Answer: a [Reason:] Goal of Crunch is to make pipelines that are composed of many user-defined functions simple to write, easy to test, and efficient to run.

2. Point out the correct statement :
a) Scrunch’s Java API is centered around three interfaces that represent distributed datasets
b) All of the other data transformation operations supported by the Crunch APIs are implemented in terms of three primitives
c) A number of common Aggregator implementations are provided in the Aggregators class
d) All of the mentioned

View Answer

Answer: c [Reason:] PGroupedTable provides a combineValues operation that allows a commutative and associative Aggregator to be applied to the values of the PGroupedTable instance on both the map and reduce sides of the shuffle.

3. For Scala users, there is the __________ API, which is built on top of the Java APIs
a) Prunch
b) Scrunch
c) Hivench
d) All of the mentioned

View Answer

Answer: b [Reason:] It includes a REPL (read-eval-print loop) for creating MapReduce pipelines.

4. The Crunch APIs are modeled after _________ , which is the library that Google uses for building data pipelines on top of their own implementation of MapReduce.
a) FlagJava
b) FlumeJava
c) FlakeJava
d) All of the mentioned

View Answer

Answer: b [Reason:] The Apache Crunch project develops and supports Java APIs that simplify the process of creating data pipelines on top of Apache Hadoop.

5. Point out the wrong statement :
a) Crunch pipeline written by the development team sessionizes a set of user logs generates are then processed by a diverse collection of Pig scripts and Hive queries
b) Crunch pipelines provide a thin veneer on top of MapReduce
c) Developers have access to low-level MapReduce APIs
d) None of the mentioned

View Answer

Answer: d [Reason:] Crunch is extremely fast, only slightly slower than a hand-tuned pipeline developed with the MapReduce APIs.

6. Crunch was designed for developers who understand __________ and want to use MapReduce effectively.
a) Java
b) Python
c) Scala
d) Javascript

View Answer

Answer: a [Reason:] Crunch is often used in conjunction with Hive and Pig.

7. Hive, Pig, and Cascading all use a _________ data model .
a) value centric
b) columnar
c) tuple-centric
d) none of the mentioned

View Answer

Answer: c [Reason:] Crunch allows developers considerable flexibility in how they represent their data, which makes Crunch the best pipeline platform for developers.

8. A __________ represents a distributed, immutable collection of elements of type T.
a) PCollect
b) PCollection
c) PCol
d) All of the mentioned

View Answer

Answer: b [Reason:] PCollection provides a method, parallelDo, that applies a DoFn to each element in the PCollection.

9. ___________ executes the pipeline as a series of MapReduce jobs.
a) SparkPipeline
b) MRPipeline
c) MemPipeline
d) None of the mentioned

View Answer

Answer: b [Reason:] Every Crunch data pipeline is coordinated by an instance of the Pipeline interface.

10. __________ represent the logical computations of your Crunch pipelines.
a) DoFns
b) DoFn
c) ThreeFns
d) None of the mentioned

View Answer

Answer: a [Reason:] DoFns are designed to be easy to write, easy to test, and easy to deploy within the context of a MapReduce job.

Hadoop MCQ Set 3

1. ________ is a programming model designed for processing large volumes of data in parallel by dividing the work into a set of independent tasks.
a) Hive
b) MapReduce
c) Pig
d) Lucene

View Answer

Answer: b [Reason:] MapReduce is the heart of hadoop.

2. Point out the correct statement :
a) Data locality means movement of algorithm to the data instead of data to algorithm
b) When the processing is done on the data algorithm is moved across the Action Nodes rather than data to the algorithm
c) Moving Computation is expensive than Moving Data
d) None of the mentioned

View Answer

Answer: a [Reason:] Data flow framework possesses the feature of data locality.

3. The daemons associated with the MapReduce phase are ________ and task-trackers.
a) job-tracker
b) map-tracker
c) reduce-tracker
d) all of the mentioned

View Answer

Answer: a [Reason:] Map-Reduce jobs are submitted on job-tracker.

4. The JobTracker pushes work out to available _______ nodes in the cluster, striving to keep the work as close to the data as possible
a) DataNodes
b) TaskTracker
c) ActionNodes
d) All of the mentioned

View Answer

Answer: b [Reason:] A heartbeat is sent from the TaskTracker to the JobTracker every few minutes to check its status whether the node is dead or alive.

5. Point out the wrong statement :
a) The map function in Hadoop MapReduce have the following general form:map:(K1, V1) → list(K2, V2)
b) The reduce function in Hadoop MapReduce have the following general form: reduce: (K2, list(V2)) → list(K3, V3)
c) MapReduce has a complex model of data processing: inputs and outputs for the map and reduce functions are key-value pairs
d) None of the mentioned

View Answer

Answer: c [Reason:] MapReduce is relatively simple model to implement in Hadoop.

6. InputFormat class calls the ________ function and computes splits for each file and then sends them to the jobtracker.
a) puts
b) gets
c) getSplits
d) all of the mentioned

View Answer

Answer: c [Reason:] InputFormat uses their storage locations to schedule map tasks to process them on the tasktrackers.

7. On a tasktracker, the map task passes the split to the createRecordReader() method on InputFormat to obtain a _________ for that split.
a) InputReader
b) RecordReader
c) OutputReader
d) None of the mentioned

View Answer

Answer: b [Reason:] The RecordReader loads data from its source and converts into key-value pairs suitable for reading by mapper.

8. The default InputFormat is __________ which treats each value of input a new value and the associated key is byte offset.
a) TextFormat
b) TextInputFormat
c) InputFormat
d) All of the mentioned

View Answer

Answer: b [Reason:] A RecordReader is little more than an iterator over records, and the map task uses one to generate record key-value pairs.

9. __________ controls the partitioning of the keys of the intermediate map-outputs.
a) Collector
b) Partitioner
c) InputFormat
d) None of the mentioned

View Answer

Answer: b [Reason:] The output of the mapper is sent to the partitioner.

10. Output of the mapper is first written on the local disk for sorting and _________ process.
a) shuffling
b) secondary sorting
c) forking
d) reducing

View Answer

Answer: a [Reason:] All values corresponding to the same key will go the same reducer.

Hadoop MCQ Set 4

1. The HDFS client software implements __________ checking on the contents of HDFS files.
a) metastore
b) parity
c) checksum
d) none of the mentioned

View Answer

Answer: c [Reason:] When a client creates an HDFS file, it computes a checksum of each block of the file and stores these checksums in a separate hidden file in the same HDFS namespace.

2. Point out the correct statement :
a) The HDFS architecture is compatible with data rebalancing schemes
b) Datablocks support storing a copy of data at a particular instant of time
c) HDFS currently support snapshots
d) None of the mentioned

View Answer

Answer: a [Reason:] A scheme might automatically move data from one DataNode to another if the free space on a DataNode falls below a certain threshold.

3. The ___________ machine is a single point of failure for an HDFS cluster.
a) DataNode
b) NameNode
c) ActionNode
d) All of the mentioned

View Answer

Answer: b [Reason:] If the NameNode machine fails, manual intervention is necessary. Currently, automatic restart and failover of the NameNode software to another machine is not supported.

4. The ____________ and the EditLog are central data structures of HDFS.
a) DsImage
b) FsImage
c) FsImages
d) All of the mentioned

View Answer

Answer: b [Reason:] A corruption of these files can cause the HDFS instance to be non-functional.

5. Point out the wrong statement :
a) HDFS is designed to support small files only
b) Any update to either the FsImage or EditLog causes each of the FsImages and EditLogs to get updated synchronously
c) NameNode can be configured to support maintaining multiple copies of the FsImage and EditLog
d) None of the mentioned

View Answer

Answer: a [Reason:] HDFS is designed to support very large files.

6. __________ support storing a copy of data at a particular instant of time.
a) Data Image
b) Datanots
c) Snapshots
d) All of the mentioned

View Answer

Answer: c [Reason:] One usage of the snapshot feature may be to roll back a corrupted HDFS instance to a previously known good point in time.

7. Automatic restart and ____________ of the NameNode software to another machine is not supported.
a) failover
b) end
c) scalability
d) all of the mentioned

View Answer

Answer: a [Reason:] If the NameNode machine fails, manual intervention is necessary.

8. HDFS, by default, replicates each data block _____ times on different nodes and on at least ____ racks.
a) 3,2
b) 1,2
c) 2,3
d) All of the mentioned

View Answer

Answer: a [Reason:] HDFS has a simple yet robust architecture that was explicitly designed for data reliability in the face of faults and failures in disks, nodes and networks.

9. _________ stores its metadata on multiple disks that typically include a non-local file server.
a) DataNode
b) NameNode
c) ActionNode
d) None of the mentioned

View Answer

Answer: b [Reason:] HDFS tolerates failures of storage servers (called DataNodes) and its disks.

10. The HDFS file system is temporarily unavailable whenever the HDFS ________ is down.
a) DataNode
b) NameNode
c) ActionNode
d) None of the mentioned

View Answer

Answer: b [Reason:] When the HDFS NameNode is restarted it recovers its metadata.

Hadoop MCQ Set 5

1. Which of the following is shortcut for DUMP operator ?
a) de alias
b) d alias
c) q
d) None of the mentioned

View Answer

Answer: b [Reason:] If alias is ignored last defined alias will be used.

2. Point out the correct statement:
a) Invoke the Grunt shell using the “enter” command
b) Pig does not support jar files
c) Both the run and exec commands are useful for debugging because you can modify a Pig script in an editor
d) All of the mentioned

View Answer

Answer: c [Reason:] Both commands promote Pig script modularity as they allow you to reuse existing components.

3. Which of the following command is used to show values to keys used in Pig ?
a) set
b) declare
c) display
d) All of the mentioned

View Answer

Answer: a [Reason:] All Pig and Hadoop properties can be set, either in the Pig script or via the Grunt command line.

4. Use the __________ command to run a Pig script that can interact with the Grunt shell (interactive mode).
a) fetch
b) declare
c) run
d) all of the mentioned

View Answer

Answer: c [Reason:] With the run command, every store triggers execution.

5. Point out the wrong statement:
a) You can run Pig scripts from the command line and from the Grunt shell
b) DECLARE defines a Pig macro
c) Use Pig scripts to place Pig Latin statements and Pig commands in a single file
d) None of the mentioned

View Answer

Answer: b [Reason:] DEFINE defines a Pig macro.

6. Which of the following command can be used for debugging ?
a) exec
b) execute
c) error
d) throw

View Answer

Answer: a [Reason:] With the exec command, store statements will not trigger execution; rather, the entire script is parsed before execution starts.

7. Which of the following file contains user defined functions (UDFs) ?
a) script2-local.pig
b) pig.jar
c) tutorial.jar
d) excite.log.bz2

View Answer

Answer: c [Reason:] tutorial.jar contains java classes also.

8. Which of the following is correct syntax for parameter substitution using cmd ?
a) pig {-param param_name = param_value | -param_file file_name} [-debug | -dryrun] script
b) {%declare | %default} param_name param_value
c) {%declare | %default} param_name param_value cmd
d) All of the mentioned

View Answer

Answer: a [Reason:] Parameter Substitution is used to substitute values for parameters at run time.

9. You can specify parameter names and parameter values in one of the ways:
a) As part of a command line.
b) In parameter file, as part of a command line
c) With the declare statement, as part of Pig script
d) All of the mentioned

View Answer

Answer: d [Reason:] Parameter substitution may be used inside of macros.

10. _________ are scanned in the order they are specified on the command line.
a) Command line parameters
b) Parameter files
c) Declare and default preprocessors
d) Both parameter files and command line parameters

View Answer

Answer: d [Reason:] Parameters and command parameters are scanned in FIFO manner.