Generic selectors
Exact matches only
Search in title
Search in content
Search in posts
Search in pages
Filter by Categories
nmims post
Objective Type Set
Online MCQ Assignment
Question Solution
Solved Question
Uncategorized

Hadoop MCQ Set 1

1. ZooKeeper itself is intended to be replicated over a sets of hosts called :
a) chunks
b) ensemble
c) subdomains
d) none of the mentioned

View Answer

Answer: b [Reason:] As long as a majority of the servers are available, the ZooKeeper service will be available.

2. Point out the correct statement :
a) ZooKeeper can achieve high throughput and high latency numbers.
b) The fault tolerant ordering means that sophisticated synchronization primitives can be implemented at the client
c) The ZooKeeper implementation puts a premium on high performance, highly available, strictly ordered access
d) All of the mentioned

View Answer

Answer: c [Reason:] The performance aspects of ZooKeeper means it can be used in large, distributed systems.

3. Which of the guarantee is provided by Zookeeper ?
a) Interactivity
b) Flexibility
c) Scalability
d) Reliability

View Answer

Answer: d [Reason:] Once an update has been applied, it will persist from that time forward until a client overwrites the update.

4. ZooKeeper is especially fast in ___________ workloads
a) write
b) read-dominant
c) read-write
d) none of the mentioned

View Answer

Answer: b [Reason:] ZooKeeper applications run on thousands of machines, and it performs best where reads are more common than writes, at ratios of around 10:1.

5. Point out the wrong statement :
a) Distributed applications use SQL to store important configuration information
b) The service maintains a record of all transactions, which can be used for higher-level abstractions, like synchronization primitives
c) ZooKeeper maintain a standard hierarchical name space, similar to files and directories
d) ZooKeeper provides superior reliability through redundant services

View Answer

Answer: d [Reason:] Distributed applications use Zookeeper to store and mediate updates to important configuration information.

6. When a _______ is triggered the client receives a packet saying that the znode has changed.
a) event
b) watch
c) row
d) value

View Answer

Answer: b [Reason:] ZooKeeper supports the concept of watches. Clients can set a watch on a znodes.

7. The underlying client-server protocol has changed in version _______ of ZooKeeper.
a) 2.0.0
b) 3.0.0
c) 4.0.0
d) 6.0.0

View Answer

Answer: b [Reason:] Old pre-3.0.0 clients are not guaranteed to operate against upgraded 3.0.0 servers and vice-versa.

8. The java package structure has changed from com.yahoo.zookeeper* to :
a) apache.zookeeper
b) org.apache.zookeeper
c) org.apache.zookeeper.package
d) all of the mentioned

View Answer

Answer: b [Reason:] A number of constants used in the client ZooKeeper API were re-specified using enums (rather than ints).

9. A number of constants used in the client ZooKeeper API were renamed in order to reduce ________ collision
a) value
b) namespace
c) counter
d) none of the mentioned

View Answer

Answer: b [Reason:] ZOOKEEPER-18 removed KeeperStateChanged, use KeeperStateDisconnected instead.

10. ZooKeeper allows distributed processes to coordinate with each other through registers, known as :
a) znodes
b) hnodes
c) vnodes
d) rnodes

View Answer

Answer: a [Reason:] Every znode is identified by a path, with path elements separated by a slash.

Hadoop MCQ Set 2

1. Kafka is comparable to traditional messaging systems such as :
a) Impala
b) ActiveMQ
c) BigTop
d) Zookeeper

View Answer

Answer: b [Reason:] Kafka works well as a replacement for a more traditional message broker.

2. Point out the correct statement :
a) The original use case for Kafka was to be able to rebuild a user activity tracking pipeline as a set of real-time publish-subscribe feeds
b) Activity tracking is often very high volume as many activity messages are generated for each user page view.
c) Kafka is often used for operational monitoring data
d) All of the mentioned

View Answer

Answer: d [Reason:] This involves aggregating statistics from distributed applications to produce centralized feeds of operational data.

3. Many people use Kafka as a replacement for a ___________ solution.
a) log aggregation
b) compaction
c) collection
d) all of the mentioned

View Answer

Answer: a [Reason:] Log aggregation typically collects physical log files off servers and puts them in a central place.

4. _______________ is a style of application design where state changes are logged as a time-ordered sequence of records.
a) Event sourcing
b) Commit Log
c) Stream Processing
d) None of the mentioned

View Answer

Answer: a [Reason:] Kafka’s support for very large stored log data makes it an excellent backend for an application built in this style.

5. Point out the wrong statement :
a) Kafka can serve as a kind of external commit-log for a distributed system
b) The log helps replicate data between nodes and acts as a re-syncing mechanism for failed nodes to restore their data
c) Kafka comes with a command line client that will take input from a file or from standard input and send it out as messages to the Kafka cluster
d) All of the mentioned

View Answer

Answer: d [Reason:] By default each line will be sent as a separate message.

6. Kafka uses __________ so you need to first start a ZooKeeper server if you don’t already have one.
a) Impala
b) ActiveMQ
c) BigTop
d) Zookeeper

View Answer

Answer: d [Reason:] You can use the convenience script packaged with kafka to get a quick-and-dirty single-node ZooKeeper instance.

7. __________ is the node responsible for all reads and writes for the given partition.
a) replicas
b) leader
c) follower
d) isr

View Answer

Answer: b [Reason:] Each node will be the leader for a randomly selected portion of the partitions.

8. __________ is the subset of the replicas list that is currently alive and caught-up to the leader.
a) replicas
b) leader
c) follower
d) isr

View Answer

Answer: d [Reason:] “isr” is the set of “in-sync” replicas.

9. Kafka uses key-value pairs in the ____________ file format for configuration.
a) RFC
b) Avro
c) Property
d) None of the mentioned

View Answer

Answer: c [Reason:] These key values can be supplied either from a file or programmatically.

10. __________ is the amount of time to keep a log segment before it is deleted.
a) log.cleaner.enable
b) log.retention
c) log.index.enable
d) log.flush.interval.message

View Answer

Answer: b [Reason:] log.cleaner.enable is configuration must be set to true for log compaction to run.

Hadoop MCQ Set 3

1. A fully secure Hadoop cluster needs ___________
a) SSH
b) SSL
c) Kerberos
d) REST

View Answer

Answer: c [Reason:] Kerberos requires a client side library and complex client side configuration.

2. Point out the correct statement :
a) Knox is a stateless reverse proxy framework
b) Knox also intercepts REST/HTTP calls and provides authentication
c) Knox scales linearly by adding more Knox nodes as the load increases
d) All of the mentioned

View Answer

Answer: d [Reason:] Knox can be deployed as a cluster of Knox instances that route requests to Hadoop’s REST APIs.

3. A __________ can route requests to multiple Knox instances.
a) collector
b) load balancer
c) comparator
d) all of the mentioned

View Answer

Answer: b [Reason:] Knox is a stateless reverse proxy framework.

4. Knox provides perimeter _________ for Hadoop clusters.
a) reliability
b) security
c) flexibility
d) fault tolerant

View Answer

Answer: b [Reason:] Kerberos requires a client side library and complex client side configuration.

5. Point out the wrong statement :
a) Knox eliminates the need for client software or client configuration and thus simplifies the access model
b) Simplified access entend Hadoop’s REST/HTTP services by encapsulating Kerberos within the cluster
c) Knox intercepts web vulnerability removal and other security services through a series of extensible interceptor pipelines
d) None of the mentioned

View Answer

Answer: d [Reason:] Knox aggregates REST/HTTP calls to various components within the Hadoop ecosystem.

6. Knox integrates with prevalent identity management and _______ systems.
a) SSL
b) SSO
c) SSH
d) Kerberos

View Answer

Answer: b [Reason:] Knox allows identities from those enterprise systems to be used for seamless, secure access to Hadoop clusters.

7. The easiest way to have a HDP cluster is to download the :
a) Hadoop
b) Sandbox
c) Dashboard
d) None of the mentioned

View Answer

Answer: b [Reason:] The Apache Knox Gateway is a system that provides a single point of authentication and access for Apache™ Hadoop® services.

8. Apache Knox Eliminates _______ edge node risks.
a) SSL
b) SSO
c) SSH
d) All of the mentioned

View Answer

Answer: c [Reason:] Knox hides Network Topology.

9. Apache Knox accesses Hadoop Cluster over :
a) HTTP
b) TCP
c) ICMP
d) None of the mentioned

View Answer

Answer: a [Reason:] Knox supports LDAP/AD Authentication, Service Authorization and Audit.

10. Apache Knox provides __________ REST API Access Point.
a) Single
b) Double
c) Multiple
d) Zero

View Answer

Answer: a [Reason:] The Apache Knox Gateway is a system that provides a single point of authentication and access for Apache.

Hadoop MCQ Set 4

1. ___________ provides Java-based indexing and search technology.
a) Solr
b) Lucene Core
c) Lucy
d) All of the mentioned

View Answer

Answer: b [Reason:] Lucene provides spellchecking, hit highlighting and advanced analysis/tokenization capabilities.

2. Point out the correct statement :
a) Building PyLucene requires GNU Make, a recent version of Ant capable of building Java Lucene and a C++ compiler
b) PyLucene is supported on Mac OS X, Linux, Solaris and Windows
c) Use of setuptools is recommended for Lucene
d) All of the mentioned

View Answer

Answer: d [Reason:] PyLucene requires Python version 2.x (x >= 3.5) and Java version 1.x (x >= 5).

3. ___________ is a high performance search server built using Lucene Core.
a) Solr
b) Lucene Core
c) Lucy
d) PyLucene

View Answer

Answer: a [Reason:] Solr provides hit highlighting, faceted search, caching, replication, and a web admin interface.

4. ____________ is a subproject with the aim of collecting and distributing free materials
a) OSR
b) OPR
c) ORP
d) ORS

View Answer

Answer: c [Reason:] Open Relevance Project is used for relevance testing and performance.

5. Point out the wrong statement :
a) PyLucene is a Lucene port
b) PyLucene embeds a Java VM with Lucene into a Python process
c) The PyLucene Python extension, a Python module called lucene is machine-generated by JCC
d) PyLucene is built with JCC

View Answer

Answer: a [Reason:] PyLucene is not a Lucene port but a Python wrapper around Java Lucene.

6. _______ is a Python port of the Core project.
a) Solr
b) Lucene Core
c) Lucy
d) PyLucene

View Answer

Answer: d [Reason:] PyLucene is a Python extension for accessing Java LuceneTM.

7. The Lucene _________ is pleased to announce the availability of Apache Lucene 5.0.0 and Apache Solr 5.0.0.
a) PMC
b) RPC
c) CPM
d) All of the mentioned

View Answer

Answer: a [Reason:] PyLucene was previously hosted at the Open Source Applications Foundation.

8. ___________ is a technology suitable for nearly any application that requires full-text search, especially cross-platform.
a) Lucene
b) Oozie
c) Lucy
d) All of the mentioned

View Answer

Answer: a [Reason:] Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java.

9. Lucene provides scalable,high-Performance indexing over ______ per hour on modern hardware.
a) 1 TB
b) 150GB
c) 10 GB
d) None of the mentioned

View Answer

Answer: b [Reason:] Lucene offers powerful features through a simple API.

10. Lucene index size is roughly _______ the size of text indexed.
a) 10%
b) 20%
c) 50%
d) 70%

View Answer

Answer: b [Reason:] Lucene provides incremental indexing as fast as batch indexing.

Hadoop MCQ Set 5

1. ___________ takes node and rack locality into account when deciding which blocks to place in the same split
a) CombineFileOutputFormat
b) CombineFileInputFormat
c) TextFileInputFormat
d) None of the mentioned

View Answer

Answer: b [Reason:] CombineFileInputFormat does not compromise the speed at which it can process the input in a typical MapReduce job.

2. Point out the correct statement :
a) With TextInputFormat and KeyValueTextInputFormat, each mapper receives a variable number of lines of input
b) StreamXmlRecordReader, the page elements can be interpreted as records for processing by a mapper
c) The number depends on the size of the split and the length of the lines.
d) All of the mentioned

View Answer

Answer: d [Reason:] Large XML documents that are composed of a series of “records” can be broken into these records using simple string or regular-expression matching to find start and end tags of records.

3. The key, a ____________ is the byte offset within the file of the beginning of the line.
a) LongReadable
b) LongWritable
c) LongWritable
d) All of the mentioned

View Answer

Answer: b [Reason:] The value is the contents of the line, excluding any line terminators (newline, carriage return), and is packaged as a Text object.

4. _________ is the output produced by TextOutputFor mat, Hadoop’s default OutputFormat.
a) KeyValueTextInputFormat
b) KeyValueTextOutputFormat
c) FileValueTextInputFormat
d) All of the mentioned

View Answer

Answer: b [Reason:] To interpret such files correctly, KeyValueTextInputFormat is appropriate.

5. Point out the wrong statement :
a) Hadoop’s sequence file format stores sequences of binary key-value pairs
b) SequenceFileAsBinaryInputFormat is a variant of SequenceFileInputFormat that retrieves the sequence file’s keys and values as opaque binary objects
c) SequenceFileAsTextInputFormat is a variant of SequenceFileInputFormat that retrieves the sequence file’s keys and values as opaque binary objects.
d) None of the mentioned

View Answer

Answer: c [Reason:] SequenceFileAsBinaryInputFormat is used for reading keys, values from SequenceFiles in binary (raw) format.

6. __________ is a variant of SequenceFileInputFormat that converts the sequence file’s keys and values to Text objects
a) SequenceFile
b) SequenceFileAsTextInputFormat
c) SequenceAsTextInputFormat
d) All of the mentioned

View Answer

Answer: b [Reason:] With multiple reducers, records will be allocated evenly across reduce tasks, with all records that share the same key being processed by the same reduce task.

7. __________ class allows you to specify the InputFormat and Mapper to use on a per-path basis.
a) MultipleOutputs
b) MultipleInputs
c) SingleInputs
d) None of the mentioned

View Answer

Answer: b [Reason:] One might be tab-separated plain text, the other a binary sequence file. Even if they are in the same format, they may have different representations, and therefore need to be parsed differently.

8. ___________ is an input format for reading data from a relational database, using JDBC.
a) DBInput
b) DBInputFormat
c) DBInpFormat
d) All of the mentioned

View Answer

Answer: b [Reason:] DBInputFormat is the most frequently used format for reading data.

9. Which of the following is the default output format ?
a) TextFormat
b) TextOutput
c) TextOutputFormat
d) None of the mentioned

View Answer

Answer: c [Reason:] TextOutputFormat keys and values may be of any type.

10. Which of the following writes MapFiles as output ?
a) DBInpFormat
b) MapFileOutputFormat
c) SequenceFileAsBinaryOutputFormat
d) None of the mentioned

View Answer

Answer: c [Reason:] SequenceFileAsBinaryOutputFormat writes keys and values in raw binary format into a SequenceFile container.