Generic selectors
Exact matches only
Search in title
Search in content
Search in posts
Search in pages
Filter by Categories
nmims post
Objective Type Set
Online MCQ Assignment
Question Solution
Solved Question
Uncategorized

Hadoop MCQ Set 1

1. ________ is a platform for constructing data flows for extract, transform, and load (ETL) processing and analysis of large datasets.
a) Pig Latin
b) Oozie
c) Pig
d) Hive

View Answer

Answer: c [Reason:] Apache Pig is a platform for analyzing large data sets that consists of a high-level language for expressing data analysis programs.

2. Point out the correct statement :
a) Hive is not a relational database, but a query engine that supports the parts of SQL specific to querying data
b) Hive is a relational database with SQL support
c) Pig is a relational database with SQL support
d) All of the mentioned

View Answer

Answer: a [Reason:] Hive is a SQL-based data warehouse system for Hadoop that facilitates data summarization, ad hoc queries, and the analysis of large datasets stored in Hadoop-compatible file systems.

3. _________ hides the limitations of Java behind a powerful and concise Clojure API for Cascading.
a) Scalding
b) HCatalog
c) Cascalog
d) All of the mentioned

View Answer

Answer: c [Reason:] Cascalog also adds Logic Programming concepts inspired by Datalog. Hence the name “Cascalog” is a contraction of Cascading and Datalog.

4. Hive also support custom extensions written in :
a) C#
b) Java
c) C
d) C++

View Answer

Answer: b [Reason:] Hive also support custom extensions written in Java, including user-defined functions (UDFs) and serializer-deserializers for reading and optionally writing custom formats.

5. Point out the wrong statement :
a) Elastic MapReduce (EMR) is Facebook’s packaged Hadoop offering
b) Amazon Web Service Elastic MapReduce (EMR) is Amazon’s packaged Hadoop offering
c) Scalding is a Scala API on top of Cascading that removes most Java boilerplate
d) All of the mentioned

View Answer

Answer: a [Reason:] Rather than building Hadoop deployments manually on EC2 (Elastic Compute Cloud) clusters, users can spin up fully configured Hadoop installations using simple invocation commands, either through the AWS Web Console or through command-line tools.

6. ________ is the most popular high-level Java API in Hadoop Ecosystem
a) Scalding
b) HCatalog
c) Cascalog
d) Cascading

View Answer

Answer: d [Reason:] Cascading hides many of the complexities of MapReduce programming behind more intuitive pipes and data flow abstractions.

7. ___________ is general-purpose computing model and runtime system for distributed data analytics.
a) Mapreduce
b) Drill
c) Oozie
d) None of the mentioned

View Answer

Answer: a [Reason:] Mapreduce provides a flexible and scalable foundation for analytics, from traditional reporting to leading-edge machine learning algorithms.

8. The Pig Latin scripting language is not only a higher-level data flow language but also has operators similar to :
a) SQL
b) JSON
c) XML
d) All of the mentioned

View Answer

Answer: a [Reason:] Pig Latin, in essence, is designed to fill the gap between the declarative style of SQL and the low-level procedural style of MapReduce.

9. _______ jobs are optimized for scalability but not latency.
a) Mapreduce
b) Drill
c) Oozie
d) Hive

View Answer

Answer: d [Reason:] Hive Queries are translated to MapReduce jobs to exploit the scalability of MapReduce.

10. ______ is a framework for performing remote procedure calls and data serialization.
a) Drill
b) BigTop
c) Avro
d) Chukwa

View Answer

Answer: c [Reason:] In the context of Hadoop, Avro can be used to pass data from one program or language to another.

Hadoop MCQ Set 2

1. BatchEE projects aims to provide a _________ implementation (aka JSR352)
a) JBat
b) JBatch
c) JBash
d) None of the mentioned

View Answer

Answer: b [Reason:] BatchEE provides a set of useful extensions for this specification.

2. Point out the correct statement :
a) Blur is a search platform capable of searching massive amounts of data in a cloud computing environment.
b) Calcite is a not a very good customizable engine for parsing
c) Broklyn is a highly customizable engine for parsing
d) All of the mentioned

View Answer

Answer: a [Reason:] Blur is a an incubator developed by Doug Cutting.

3. _________ allows database-like access, and in particular a SQL interface
a) JBatch
b) Calcite
c) Blur
d) All of the mentioned

View Answer

Answer: b [Reason:] Calcite also provides advanced query optimization, for data not residing in a traditional database.

4. ___________ is a toolkit/application for converting between and editing common office file formats
a) Droids
b) DataFu
c) Corinthia
d) Ignite

View Answer

Answer: c [Reason:] The toolkit is small, portable, and flexible, with minimal dependencies.

5. Point out the wrong statement :
a) Droids aims to be an intelligent standalone robot framework that allows to create and extend existing droids
b) HTrace is a tracing framework intended for use with distributed systems written in java
c) DataFu provides a collection of Hadoop MapReduce jobs and functions in higher level languages
d) All of the mentioned

View Answer

Answer: d [Reason:] DataFu also provides Hadoop jobs for incremental data processing in MapReduce.

6. Ignite is a unified ______ data fabric providing high-performance, distributed in-memory data management.
a) Column
b) In-Memory
c) Row oriented
d) All of the mentioned

View Answer

Answer: b [Reason:] Ignite can be used for various data sources and user applications.

7. _________ is a distributed and scalable OLAP engine built on Hadoop to support extremely large datasets.
a) Kylin
b) Lens
c) log4cxx2
d) MRQL

View Answer

Answer: a [Reason:] MRQL is a query processing and optimization system for large-scale, distributed data analysis.

8. NiFi is a dataflow system based on the concepts of ________ programming.
a) structured
b) relational
c) set
d) flow-based

View Answer

Answer: d [Reason:] NiFi is incubator made by Billie Rinaldi.

9. __________ is a columnar storage format for Hadoop.
a) Ranger
b) Parquet
c) REEF
d) None of the mentioned

View Answer

Answer: b [Reason:] The Ranger project is a framework to enable, monitor and manage comprehensive data security across the Hadoop platform.

10. Ripple is a browser based mobile phone emulator designed to aid in the development of _______ based mobile applications.
a) Javascript
b) Java
c) C++
d) HTML5

View Answer

Answer: d [Reason:] Ripple is a cross platform and cross runtime testing/debugging tool.

Hadoop MCQ Set 3

1.____________ is a query processing and optimization system for large-scale.
a) MRQL
b) NiFi
c) OpenAz
d) ODF Toolkit

View Answer

Answer: a [Reason:] MRQL is built on top of Apache Hadoop, Hama, Spark, and Flink.

2. Point out the correct statement :
a) SAMOA provides a collection of distributed streaming algorithms
b) REEF is a cross platform and cross runtime testing/debugging tool
c) Sentry is a highly modular system for providing fine grained role
d) All of the mentioned

View Answer

Answer: a [Reason:] SAMOA provides a collection of distributed streaming algorithms for the most common data mining and machine learning tasks such as classification, clustering, and regression.

3. ________ is a columnar storage format for Hadoop.
a) MRQL
b) NiFi
c) OpenAz
d) Parquet

View Answer

Answer: d [Reason:] NiFi is a dataflow system based on the concepts of flow-based programming.

4. REEF is a scale-out computing fabric that eases the development of Big Data applications
a) MRQL
b) NiFi
c) REEF
d) Ripple

View Answer

Answer: c [Reason:] REEF stands for Retainable Evaluator Execution Framework.

5. Point out the wrong statement :
a) OpenAz is a browser based mobile phone emulator
b) Ripple is a cross platform and cross runtime testing/debugging tool
c) Ripple currently supports such runtimes as Cordova, WebWorks and the Mobile Web
d) All of the mentioned

View Answer

Answer: a [Reason:] Ripple is a browser based mobile phone emulator designed to aid in the development of HTML5 based mobile applications.

6. Which of the following is a monitoring solution for hadoop ?
a) Sirona
b) Sentry
c) Slider
d) Streams

View Answer

Answer: a [Reason:] Slider is a collection of tools and technologies to package, deploy, and manage long running applications on Apache Hadoop YARN clusters.

7. Apache ________ is a lightweight server for ActivityStreams.
a) Sirona
b) Taverna
c) Slider
d) Streams

View Answer

Answer: d [Reason:] Taverna is a domain-independent suite of tools used to design and execute data-driven workflows.

8. Which of the following provides extendible modern and functional API leveraging SE, ME and EE environments ?
a) Sirona
b) Taverna
c) Tamaya
d) Streams

View Answer

Answer: c [Reason:] Tamaya is a highly flexible configuration solution based on an modular, extensible and injectable key/value based design.

9. __________ is an abstraction over Apache Hadoop YARN that reduces the complexity of developing distributed applications
a) Wave
b) Twill
c) Usergrid
d) None of the mentioned

View Answer

Answer: b [Reason:] Twill allow developers to focus more on their business logic.

10. A _________ is a hosted, live, concurrent data structure for rich communication.
a) Wave
b) Twill
c) Usergrid
d) None of the mentioned

View Answer

Answer: a [Reason:] Usergrid is Backend-as-a-Service (BaaS) composed of an integrated database (Cassandra), application layer and client tier with SDKs for developers.

Hadoop MCQ Set 4

1. Which of the following is a collaborative data analytics and visualization tool ?
a) ACE
b) Abdera
c) Zeppelin
d) Accumulo

View Answer

Answer: c [Reason:] Zeppelin is used for general-purpose data processing systems such as Apache Spark, Apache Flink, etc.

2. Point out the wrong statement :
a) Buildr is a simple and intuitive build system for Java projects written in Ruby
b) Celix is an OSGi like implementation in C with a distinct focus on interoperability between Java and C
c) The Bean Validation project will create an implementation of Bean Validation as defined by the Java EE specifications
d) All of the mentioned

View Answer

Answer: d [Reason:] Beehive provides extensible Java application framework with an integrated metadata-driven programming model for web services.

3. _____________ is a software distribution framework based on OSGi
a) ACE
b) Abdera
c) Zeppelin
d) Accumulo

View Answer

Answer: a [Reason:] ACE allows you to manage and distribute artifacts.

4. ___________ forge software for the development of software projects.
a) Oozie
b) Allura
c) Ambari
d) All of the mentioned

View Answer

Answer: b [Reason:] Projects include source control systems, issue tracking, discussion, wiki, and other software project management tools.

5. Point out the correct statement :
a) Ambari is a monitoring, administration and lifecycle management project for Apache Hadoop clusters
b) The Amber project will deliver a Java development framework mainly aimed to build OAuth-aware applications
c) Bigtop is a project for the development of packaging and tests of the Hadoop ecosystem
d) All of the Mentioned

View Answer

Answer: b [Reason:] Amber graduated with the name Apache Oltu.

6. ___________ is a software development collaboration tool.
a) Buildr
b) Cassandra
c) Bloodhound
d) All of the mentioned

View Answer

Answer: c [Reason:] Buildr is a simple and intuitive build system for Java projects written in Ruby.

7. _____________ is an IaaS (“Infrastracture as a Service”) cloud orchestration platform.
a) CloudStack
b) Cazerra
c) Click
d) All of the mentioned

View Answer

Answer: a [Reason:] Click is a component based Java Web Framework.

8. Apache __________ is a platform for building native mobile applications using HTML, CSS and JavaScript (formerly Phonegap).
a) Cazerra
b) Cordova
c) CouchDB
d) All of the mentioned

View Answer

Answer: b [Reason:] The project entered incubation as Callback, but decided to change its name to Cordova on 2011-11-28.

9. ___________ is a Java library for writing, testing, and running pipelines of MapReduce jobs on Apache Hadoop.
a) cTakes
b) Crunch
c) CouchDB
d) None of the mentioned

View Answer

Answer: b [Reason:] cTAKES (clinical Text Analysis and Knowledge Extraction System) is an natural language processing tool for information extraction from electronic medical record clinical free-text.

10. Which of the following project will create a SOA services framework ?
a) DeltaCloud
b) CXF
c) DeltaSpike
d) None of the mentioned

View Answer

Answer: b [Reason:] DeltaSpike is a collection of JSR-299 (CDI) Extensions for building applications on the Java SE and EE platforms.

Hadoop MCQ Set 5

1. Hadoop I/O Hadoop comes with a set of ________ for data I/O.
a) methods
b) commands
c) classes
d) none of the mentioned

View Answer

Answer: d [Reason:] Hadoop I/O consist of primitives for serialization and deserialization.

2. Point out the correct statement :
a) The sequence file also can contain a “secondary” key-value list that can be used as file Metadata
b) SequenceFile formats share a header that contains some information which allows the reader to recognize is format
c) There’re Key and Value Class Name’s that allow the reader to instantiate those classes, via reflection, for reading
d) All of the mentioned

View Answer

Answer: d [Reason:] In contrast with other persistent key-value data structures like B-Trees, you can’t seek to a specified key editing, adding or removing it.

3. Apache Hadoop’s ___________ provides a persistent data structure for binary key-value pairs.
a) GetFile
b) SequenceFile
c) Putfile
d) All of the mentioned

View Answer

Answer: b [Reason:] SequenceFile is append-only.

4. How many formats of SequenceFile are present in Hadoop I/O ?
a) 2
b) 3
c) 4
d) 5

View Answer

Answer: b [Reason:] SequenceFile has 3 available formats: An “Uncompressed” format, A “Record Compressed” format and a “Block-Compressed”.

5. Point out the wrong statement :
a) The data file contains all the key, value records but key N + 1 must be greater then or equal to the key N
b) Sequence file is a kind of hadoop file based data structure
c) Map file type is splittable as it contains a sync point after several records
d) None of the mentioned

View Answer

Answer: c [Reason:] Map file is again a kind of hadoop file based data structure and it differs from a sequence file in a matter of the order.

6. Which of the following format is more compression-aggressive ?
a) Partition Compressed
b) Record Compressed
c) Block-Compressed
d) Uncompressed

View Answer

Answer: c [Reason:] SequenceFile key-value list can be just a Text/Text pair, and is written to the file during the initialization that happens in the SequenceFile.

7. The __________ is a directory that contains two SequenceFile.
a) ReduceFile
b) MapperFile
c) MapFile
d) None of the mentioned

View Answer

Answer: c [Reason:] Sequence files are data file (“/data”) and the index file (“/index”).

8. The ______ file is populated with the key and a LongWritable that contains the starting byte position of the record.
a) Array
b) Index
c) Immutable
d) All of the mentioned

View Answer

Answer: b [Reason:] Index does’t contains all the keys but just a fraction of the keys.

9. The _________ as just the value field append(value) and the key is a LongWritable that contains the record number, count + 1.
a) SetFile
b) ArrayFile
c) BloomMapFile
d) None of the mentioned

View Answer

Answer: b [Reason:] The SetFile instead of append(key, value) as just the key field append(key) and the value is always the NullWritable instance.

10. ____________ data file takes is based on avro serializaton framework which was primarily created for hadoop.
a) Oozie
b) Avro
c) cTakes
d) Lucene

View Answer

Answer: b [Reason:] Avro is a splittable data format with a metadata section at the beginning and then a sequence of avro serialized objects.