Generic selectors
Exact matches only
Search in title
Search in content
Search in posts
Search in pages
Filter by Categories
nmims post
Objective Type Set
Online MCQ Assignment
Question Solution
Solved Question
Uncategorized

Hadoop MCQ Set 1

1. Which of the following node is responsible for executing a Task assigned to it by the JobTracker ?
a) MapReduce
b) Mapper
c) TaskTracker
d) JobTracker

View Answer

Answer: c [Reason:] TaskTracker receives the information necessary for execution of a Task from JobTracker, Executes the Task, and Sends the Results back to JobTracker.

2. Point out the correct statement :
a) MapReduce tries to place the data and the compute as close as possible
b) Map Task in MapReduce is performed using the Mapper() function
c) Reduce Task in MapReduce is performed using the Map() function
d) All of the mentioned

View Answer

Answer: a [Reason:] This feature of MapReduce is “Data Locality”.

3. ___________ part of the MapReduce is responsible for processing one or more chunks of data and producing the output results.
a) Maptask
b) Mapper
c) Task execution
d) All of the mentioned

View Answer

Answer: a [Reason:] Map Task in MapReduce is performed using the Map() function.

4. _________ function is responsible for consolidating the results produced by each of the Map() functions/tasks.
a) Reduce
b) Map
c) Reducer
d) All of the mentioned

View Answer

Answer: a [Reason:] Reduce function collates the work and resolves the results.

5. Point out the wrong statement :
a) A MapReduce job usually splits the input data-set into independent chunks which are processed by the map tasks in a completely parallel manner
b) The MapReduce framework operates exclusively on pairs
c) Applications typically implement the Mapper and Reducer interfaces to provide the map and reduce methods
d) None of the mentioned

View Answer

Answer: d [Reason:] The MapReduce framework takes care of scheduling tasks, monitoring them and re-executes the failed tasks.

6. Although the Hadoop framework is implemented in Java , MapReduce applications need not be written in :
a) Java
b) C
c) C#
d) None of the mentioned

View Answer

Answer: a [Reason:] Hadoop Pipes is a SWIG- compatible C++ API to implement MapReduce applications (non JNITM based).

7. ________ is a utility which allows users to create and run jobs with any executable as the mapper and/or the reducer.
a) Hadoop Strdata
b) Hadoop Streaming
c) Hadoop Stream
d) None of the mentioned

View Answer

Answer: b [Reason:] Hadoop streaming is one of the most important utilities in the Apache Hadoop distribution.

8. __________ maps input key/value pairs to a set of intermediate key/value pairs.
a) Mapper
b) Reducer
c) Both Mapper and Reducer
d) None of the mentioned

View Answer

Answer: a [Reason:] Maps are the individual tasks that transform input records into intermediate records.

9. The number of maps is usually driven by the total size of :
a) inputs
b) outputs
c) tasks
d) none of the mentioned

View Answer

Answer: a [Reason:] Total size of inputs means total number of blocks of the input files.

10. Running a ___________ program involves running mapping tasks on many or all of the nodes in our cluster.
a) MapReduce
b) Map
c) Reducer
d) All of the mentioned

View Answer

Answer: a [Reason:] In some applications, component tasks need to create and/or write to side-files, which differ from the actual job-output files.

Hadoop MCQ Set 2

1. Which of the following project is interface definition language for hadoop ?
a) Oozie
b) Mahout
c) Thrift
d) Impala

View Answer

Answer: c [Reason:] Thrift is an interface definition language and binary communication protocol that is used to define and create services for numerous languages.

2. Point out the correct statement :
a) Thrift is developed for scalable cross-language services development
b) Thrift includes a complete stack for creating clients and servers
c) The top part of the Thrift stack is generated code from the Thrift definition
d) All of the mentioned

View Answer

Answer: d [Reason:] The services generate from this file client and processor code.

3. __________ is used as a remote procedure call (RPC) framework for facebook.
a) Oozie
b) Mahout
c) Thrift
d) Impala

View Answer

Answer: c [Reason:] In contrast to built-in types, created data structures are sent as result in generated code.

4. Which of the following is a straightforward binary format ?
a) TCompactProtocol
b) TDenseProtocol
c) TBinaryProtocol
d) TSimpleJSONProtocol

View Answer

Answer: c [Reason:] TBinaryProtocol is not optimized for space efficiency.

5. Point out the wrong statement :
a) With Thrift, it is not possible to define a service and change the protocol and transport without recompiling the code.
b) Thrift includes server infrastructure to tie protocols and transports together, like blocking, non-blocking, and multi threaded servers.
c) Thrift supports a number of protocols for service definition
d) None of the mentioned

View Answer

Answer: d [Reason:] The underlying I/O part of the stack is differently implemented for different languages.

6. Which of the following is a more compact binary format ?
a) TCompactProtocol
b) TDenseProtocol
c) TBinaryProtocol
d) TSimpleJSONProtocol

View Answer

Answer: a [Reason:] TCompactProtocol is typically more efficient to process as well.

7. Which of the following format is similar to TCompactProtocol ?
a) TCompactProtocol
b) TDenseProtocol
c) TBinaryProtocol
d) TSimpleJSONProtocol

View Answer

Answer: b [Reason:] In TDenseProtocol,stripping off the meta information from what is transmitted.

8. ________ is a write-only protocol that cannot be parsed by Thrift.
a) TCompactProtocol
b) TDenseProtocol
c) TBinaryProtocol
d) TSimpleJSONProtocol

View Answer

Answer: d [Reason:] TSimpleJSONProtocol drops metadata using JSON. Suitable for parsing by scripting languages.

9. Which of the following Uses JSON for encoding of data ?
a) TCompactProtocol
b) TDenseProtocol
c) TBinaryProtocol
d) None of the mentioned

View Answer

Answer: d [Reason:] TJSONProtocol is uses JSON for encoding of data.

10. _____________ is a human-readable text format to aid in debugging.
a) TMemory
b) TDebugProtocol
c) TBinaryProtocol
d) TSimpleJSONProtocol

View Answer

Answer: b [Reason:] TBinaryProtocol is faster to process than the text protocol but more difficult to debug.

Hadoop MCQ Set 3

1. ________ is the architectural center of Hadoop that allows multiple data processing engines.
a) YARN
b) Hive
c) Incubator
d) Chuckwa

View Answer

Answer: a [Reason:] YARN is the prerequisite for Enterprise Hadoop, providing resource management and a central platform to deliver consistent operations, security, and data governance tools across Hadoop clusters.

2. Point out the correct statement :
a) YARN also extends the power of Hadoop to incumbent and new technologies found within the data center
b) YARN is the central point of investment for Hortonworks within the Apache community
c) YARN enhances a Hadoop compute cluster in many ways
d) All of the mentioned

View Answer

Answer: d [Reason:] YARN provides ISVs and developers a consistent framework for writing data access applications that run IN Hadoop.

3. YARN’s dynamic allocation of cluster resources improves utilization over more static _______ rules used in early versions of Hadoop.
a) Hive
b) MapReduce
c) Imphala
d) All of the mentioned

View Answer

Answer: b [Reason:] Multi-tenant data processing improves an enterprise’s return on its Hadoop investments.

4. The __________ is a framework-specific entity that negotiates resources from the ResourceManager
a) NodeManager
b) ResourceManager
c) ApplicationMaster
d) All of the mentioned

View Answer

Answer: c [Reason:] Each ApplicationMaster has responsibility for negotiating appropriate resource containers from the schedule.

5. Point out the wrong statement :
a) From the system perspective, the ApplicationMaster runs as a normal container.
b) The ResourceManager is the per-machine slave, which is responsible for launching the applications’ containers
c) The NodeManager is the per-machine slave, which is responsible for launching the applications’ containers, monitoring their resource usage
d) None of the mentioned

View Answer

Answer: b [Reason:] ResourceManager has a scheduler, which is responsible for allocating resources to the various applications running in the cluster, according to constraints such as queue capacities and user limits.

6. Apache Hadoop YARN stands for :
a) Yet Another Reserve Negotiator
b) Yet Another Resource Network
c) Yet Another Resource Negotiator
d) All of the mentioned

View Answer

Answer: c [Reason:] YARN is a cluster management technology.

7. MapReduce has undergone a complete overhaul in hadoop :
a) 0.21
b) 0.23
c) 0.24
d) 0.26

View Answer

Answer: b [Reason:] The fundamental idea of MRv2 is to split up the two major functionalities of the JobTracker.

8. The ____________ is the ultimate authority that arbitrates resources among all the applications in the system.
a) NodeManager
b) ResourceManager
c) ApplicationMaster
d) All of the mentioned

View Answer

Answer: b [Reason:] The ResourceManager and per-node slave, the NodeManager (NM), form the data-computation framework.

9. The __________ is responsible for allocating resources to the various running applications subject to familiar constraints of capacities, queues etc.
a) Manager
b) Master
c) Scheduler
d) None of the mentioned

View Answer

Answer: c [Reason:] The Scheduler is pure scheduler in the sense that it performs no monitoring or tracking of status for the application.

10. The CapacityScheduler supports _____________ queues to allow for more predictable sharing of cluster resources.
a) Networked
b) Hierarchial
c) Partition
d) None of the mentioned

View Answer

Answer: b [Reason:] The Scheduler has a pluggable policy plug-in, which is responsible for partitioning the cluster resources among the various queues, applications etc.

Hadoop MCQ Set 4

1. Users can easily run Spark on top of Amazon’s __________
a) Infosphere
b) EC2
c) EMR
d) None of the mentioned

View Answer

Answer: b [Reason:] Users can easily run Spark (and Shark) on top of Amazon’s EC2 either using the scripts that come with Spark.

2. Point out the correct statement :
a) Spark enables Apache Hive users to run their unmodified queries much faster
b) Spark interoperates only with Hadoop
c) Spark is a popular data warehouse solution running on top of Hadoop
d) None of the mentioned

View Answer

Answer: a [Reason:] Shark can accelerate Hive queries by as much as 100x when the input data fits into memory, and up 10x when the input data is stored on disk.

3. Spark runs on top of ___________, a cluster manager system which provides efficient resource isolation across distributed applications
a) Mesjs
b) Mesos
c) Mesus
d) All of the mentioned

View Answer

Answer: b [Reason:] Mesos enables fine grained sharing which allows a Spark job to dynamically take advantage of the idle resources in the cluster during its execution.

4. Which of the following can be used to launch Spark jobs inside MapReduce ?
a) SIM
b) SIMR
c) SIR
d) RIS

View Answer

Answer: b [Reason:] With SIMR, users can start experimenting with Spark and use its shell within a couple of minutes after downloading it.

5. Point out the wrong statement :
a) Spark is intended to replace, the Hadoop stack
b) Spark was designed to read and write data from and to HDFS, as well as other storage systems
c) Hadoop users who have already deployed or are planning to deploy Hadoop Yarn can simply run Spark on YARN
d) None of the mentioned

View Answer

Answer: a [Reason:] Spark is intended to enhance, not replace, the Hadoop stack.

6. Which of the following language is not supported by Spark ?
a) Java
b) Pascal
c) Scala
d) Python

View Answer

Answer: b [Reason:] The Spark engine runs in a variety of environments, from cloud services to Hadoop or Mesos clusters.

7. Spark is packaged with higher level libraries, including support for _________ queries.
a) SQL
b) C
c) C++
d) None of the mentioned

View Answer

Answer: a [Reason:] Standard libraries increase developer productivity and can be seamlessly combined to create complex workflows.

8. Spark includes a collection over ________ operators for transforming data and familiar data frame APIs for manipulating semi-structured data.
a) 50
b) 60
c) 70
d) 80

View Answer

Answer: d [Reason:] Spark provides easy-to-use APIs for operating on large datasets.

9. Spark is engineered from the bottom-up for performance, running ___________ faster than Hadoop by exploiting in memory computing and other optimizations.
a) 100x
b) 150x
c) 200x
d) None of the mentioned

View Answer

Answer: a [Reason:] Spark is fast on disk too; it currently holds the world record in large scale on-disk sorting.

10. Spark powers a stack of high-level tools including Spark SQL, MLlib for :
a) regression models
b) statistics
c) machine learning
d) reproductive research

View Answer

Answer: c [Reason:] Spark is used at a wide range of organizations to process large datasets.

Hadoop MCQ Set 5

1. __________ runs Demux parsers inside for convert unstructured data to semi-structured data, then load the key value pairs to HBase table.
a) HCatWriter
b) HBWriter
c) HBaseWriter
d) None of the mentioned

View Answer

Answer: c [Reason:] Demux parser class package, HBaseWriter uses hbase.demux.package to validate HBase for annotated demux parser classes.

2. Point out the correct statement :
a) Chukwa supports two different reliability strategies
b) chukwaCollector.asyncAcks.scantime affects how often collectors will check the filesystem for commits
c) chukwaCollector.asyncAcks.scanperiod defaults to thrice the rotation interval
d) All of the mentioned

View Answer

Answer: a [Reason:] The first, default strategy, is as follows: collectors write data to HDFS, and as soon as the HDFS write call returns success, report success to the agent, which advances its checkpoint state.

3. The __________ streams chunks of data to HDFS, and write data in temp filename with .chukwa suffix.
a) LocalWriter
b) SeqFileWriter
c) SocketTeeWriter
d) All of the mentioned

View Answer

Answer: b [Reason:] When the file is completed writing, the filename is renamed with .done suffix. SeqFileWriter has the following configuration in chukwa-collector-conf.xml.

4. Conceptually, each _________ emits a semi-infinite stream of bytes, numbered starting from zero.
a) Collector
b) Adaptor
c) Compactor
d) LocalWriter

View Answer

Answer: b [Reason:] A Chunk is a sequence of bytes, with some metadata. Several of these are set automatically by the Agent or Adaptors.

5. Point out the wrong statement :
a) Filters use the same syntax as the Dump command
b) “RAW” will send the internal data of the Chunk, without any metadata, prefixed by its length encoded as a 32-bit int
c) Specifying “WRITABLE” will cause the chunks to be written using Hadoop’s Writable serialization framework
d) None of the mentioned

View Answer

Answer: d [Reason:] “HEADER” is similar to “RAW”, but with a one-line header in front of the content.

6. The _____________ allows external processes to watch the stream of chunks passing through the collector.
a) LocalWriter
b) SeqFileWriter
c) SocketTeeWriter
d) All of the mentioned

View Answer

Answer: c [Reason:] SocketTeeWriter listens on a port (specified by conf option chukwaCollector.tee.port, defaulting to 9094.)

7. Data analytics scripts are written in ____________ .
a) Hive
b) CQL
c) PigLatin
d) Java

View Answer

Answer: c [Reason:] Data stored in HBase are aggregated by data analytic scripts to provide visualization and interpretation of health of Hadoop cluster.

8. If demux is successful within ___ attempts, archives the completed files in Chukwa.
a) one
b) two
c) three
d) all of the mentioned

View Answer

Answer: c [Reason:] The Demux MapReduce job is run on the data in demuxProcessing/mrInput.

9. Chukwa is ___________ data collection system for managing large distributed systems.
a) open source
b) proprietary
c) service based
d) none of the mentioned

View Answer

Answer: a [Reason:] Chukwa is built on top of the Hadoop Distributed File System (HDFS) and Map/Reduce framework and inherits Hadoop’s scalability and robustness.

10. Collectors write chunks to logs/*.chukwa files until a ___ MB chunk is reached.
a) 64
b) 108
c) 256
d) 1024

View Answer

Answer: a [Reason:] PostProcessManager wakes up every few minutes and aggregates, orders and de-dups record files.

Hadoop MCQ Set 6

1. __________ abstract class has three main methods for loading data and for most use cases it would suffice to extend it.
a) Load
b) LoadFunc
c) FuncLoad
d) None of the mentioned

View Answer

Answer: b [Reason:] LoadFunc and StoreFunc implementations should use the Hadoop 20 API based classes.

2. Point out the correct statement :
a) LoadMeta has methods to convert byte arrays to specific types
b) The Pig load/store API is aligned with Hadoop’s InputFormat class only
c) LoadPush has methods to push operations from Pig runtime into loader implementations
d) All of the mentioned

View Answer

Answer: c [Reason:] Currently only the pushProjection() method is called by Pig to communicate to the loader the exact fields that are required in the Pig script.

3. Which of the following has methods to deal with metadata ?
a) LoadPushDown
b) LoadMetadata
c) LoadCaster
d) All of the mentioned

View Answer

Answer: b [Reason:] Most implementation of loaders don’t need to implement this unless they interact with some metadata system.

4. ____________ method will be called by Pig both in the front end and back end to pass a unique signature to the Loader.
a) relativeToAbsolutePath()
b) setUdfContextSignature()
c) getCacheFiles()
d) getShipFiles()

View Answer

Answer: b [Reason:] The signature can be used to store into the UDFContext any information which the Loader needs to store between various method invocations in the front end and back end.

5. Point out the wrong statement :
a) The load/store UDFs control how data goes into Pig and comes out of Pig.
b) LoadCaster has methods to convert byte arrays to specific types.
c) The meaning of getNext() has changed and is called by Pig runtime to get the last tuple in the data
d) None of the mentioned

View Answer

Answer: c [Reason:] The meaning of getNext() has not changed and is called by Pig runtime to get the next tuple in the data.

6. ___________ return a list of hdfs files to ship to distributed cache.
a) relativeToAbsolutePath()
b) setUdfContextSignature()
c) getCacheFiles()
d) getShipFiles()

View Answer

Answer: d [Reason:] The default implementation provided in LoadFunc handles this for FileSystem locations.

7. The loader should use ______ method to communicate the load information to the underlying InputFormat.
a) relativeToAbsolutePath()
b) setUdfContextSignature()
c) getCacheFiles()
d) setLocation()

View Answer

Answer: d [Reason:] setLocation() method is called by Pig to communicate the load location to the loader.

8. ____________ method enables the RecordReader associated with the InputFormat provided by the LoadFunc is passed to the LoadFunc.
a) getNext()
b) relativeToAbsolutePath()
c) prepareToRead()
d) All of the mentioned

View Answer

Answer: c [Reason:] The RecordReader can then be used by the implementation in getNext() to return a tuple representing a record of data back to pig.

9. __________ method tells LoadFunc which fields are required in the Pig script.
a) pushProjection()
b) relativeToAbsolutePath()
c) prepareToRead()
d) None of the mentioned

View Answer

Answer: a [Reason:] Pig will use the column index requiredField.index to communicate with the LoadFunc about the fields required by the Pig script.

10.A loader implementation should implement __________ if casts (implicit or explicit) from DataByteArray fields to other types need to be supported.
a) LoadPushDown
b) LoadMetadata
c) LoadCaster
d) All of the mentioned

View Answer

Answer: c [Reason:] LoadCaster has methods to convert byte arrays to specific types.

Hadoop MCQ Set 7

1. Stratos will be a polyglot _________ framework
a) Daas
b) PaaS
c) Saas
d) Raas

View Answer

Answer: b [Reason:] PaaS provides developers a cloud-based environment for developing, testing, and running scalable applications.

2. Which of the following supports random-writable and advance-able sparse bitsets ?
a) Stratos
b) Kafka
c) Sqoop
d) Lucene

View Answer

Answer: d [Reason:] Lucene.Net is a port of the Lucene search engine library, written in C# and targeted at .NET runtime users.

3. ____________ is an open-source version control system.
a) Stratos
b) Kafka
c) Sqoop
d) Subversion

View Answer

Answer: d [Reason:] Subversion contains lot of features for hadoop.

4. ___________ is a distributed data warehouse system for Hadoop.
a) Stratos
b) Tajo
c) Sqoop
d) Lucene

View Answer

Answer: b [Reason:] Sqoop is a tool designed for efficiently transferring bulk data between Apache Hadoop and structured datastores such as relational databases.

5. Point out the wrong statement :
a) TusCAN ia Service Component Architecture implementation
b) Tob is a JSF based framework for web-applications
c) Traffic is a scalable and extensible HTTP proxy server and cache
d) None of the mentioned

View Answer

Answer: a [Reason:] Tuscany is used for service Component Architecture implementation.

6. ___________ is a distributed, fault-tolerant, and high-performance realtime computation system
a) Knife
b) Storm
c) Sqoop
d) Lucene

View Answer

Answer: b [Reason:] Storm provides strong guarantees on the processing of data.

7. Which of the following is a standard compliant XML Query processor ?
a) Whirr
b) VXQuery
c) Knife
d) Lens

View Answer

Answer: b [Reason:] Whirr provides code for running a variety of software services on cloud infrastructure.

8. Apache _________ is a project that enables development and consumption of REST style web services.
a) Wives
b) Wink
c) Wig
d) All of the mentioned

View Answer

Answer: b [Reason:] The core server runtime is based on the JAX-RS (JSR 311) standard.

9. __________ is a log collection and correlation software with reporting and alarming functionalities.
a) Lucene
b) ALOIS
c) Imphal
d) None of the mentioned

View Answer

Answer: b [Reason:] This Project activity is transferred to another Incubator project – ODE.

10. __________ is a non-blocking, asynchronous, event driven high performance web framework.
a) AWS
b) AWF
c) AWT
d) ASW

View Answer

Answer: b [Reason:] AWF originally known as Deft, renamed to AWF on 2012-02-15.

Hadoop MCQ Set 8

1. Drill is designed from the ground up to support high-performance analysis on the ____________ data.
a) semi-structured
b) structured
c) unstructured
d) none of the mentioned

View Answer

Answer: a [Reason:] Drill is an Apache open-source SQL query engine for Big Data exploration.

2. Point out the correct statement :
a) Drill provides plug-and-play integration with existing Apache Hive
b) Developers can use the sandbox environment to get a feel for the power and capabilities of Apache Drill by performing various types of queries
c) Drill is inspired by Google Dremel
d) None of the mentioned

View Answer

Answer: d [Reason:] Apache Drill is an open source, low latency SQL query engine for Hadoop and NoSQL.

3. ___________ includes Apache Drill as part of the Hadoop distribution.
a) Impala
b) MapR
c) Oozie
d) All of the mentioned

View Answer

Answer: b [Reason:] The MapR Sandbox with Apache Drill is a fully functional single-node cluster that can be used to get an overview on Apache Drill in a Hadoop environment.

4. MapR __________ Solution Earns Highest Score in Gigaom Research Data Warehouse Interoperability Report
a) SQL-on-Hadoop
b) Hive-on-Hadoop
c) Pig-on-Hadoop
d) All of the mentioned

View Answer

Answer: a [Reason:] Drill is a pioneer in delivering self-service data exploration capabilities on data stored in multiple formats in files or NoSQL databases.

5. Point out the wrong statement :
a) Hadoop is a prerequisite for Drill
b) Drill tackles rapidly evolving application driven schemas and nested data structures
c) Drill provides a single interface for structured and semi-structured data allowing you to readily query JSON files and HBase tables as easily as a relational table
d) All of the mentioned

View Answer

Answer: a [Reason:] Hadoop is not a prerequisite for Drill and users can start ramping up with Drill by running SQL queries directly on the local file system.

6. Drill integrates with BI tools using a standard __________ connector.
a) JDBC
b) ODBC
c) ODBC-JDBC
d) All of the mentioned

View Answer

Answer: b [Reason:] Drill conforms to the stringent ANSI SQL standards ensuring compatibility with existing BI environments as well as Hive deployments.

7. Drill analyze semi-structured/nested data coming from _________ applications.
a) RDBMS
b) NoSQL
c) NewSQL
d) None of the mentioned

View Answer

Answer: b [Reason:] Modern big data applications such as social, mobile, web and IoT deal with a larger number of users and larger amount of data than the traditional transactional applications.

8. Apache _________ provides direct queries on self-describing and semi-structured data in files.
a) Drill
b) Mahout
c) Oozie
d) All of the mentioned

View Answer

Answer: a [Reason:] Users can explore live data on their own as it arrives versus spending weeks or months on data preparation, modeling, ETL and subsequent schema management.

9. Drill provides a __________ like internal data model to represent and process data.
a) XML
b) JSON
c) TIFF
d) None of the mentioned

View Answer

Answer: b [Reason:] The flexibility of JSON data model allows Drill to query, without flattening, both simple and complex/nested data types as well as constantly changing application-driven schemas commonly seen with Hadoop/NoSQL applications.

10. Drill also provides intuitive extensions to SQL to work with _______ data types.
a) simple
b) nested
c) int
d) all of the mentioned

View Answer

Answer: b [Reason:] Users can also plug-and-play with Hive environments to enable ad-hoc low latency queries on existing Hive tables and reuse Hive’s metadata, hundreds of file formats and UDFs out of the box.