Interview MCQ Set 1
1. __________ produces box-and-whisker plots.
a) xyplot
b) dotplot
c) barchart
d) bwplot
Answer
Answer: d [Reason:] dotplot produces Cleveland dot plots.
2. __________ produces bivariate scatterplots or time-series plots.
a) xyplot
b) dotplot
c) barchart
d) bwplot
Answer
Answer: a [Reason:] barchart produces bar plots.
3. Annotation of plots in any plotting system involves adding points, lines, or text to the plot, in addition to customizing axis labels or adding titles. Different plotting systems have different sets of functions for annotating plots in this way. Which of the following functions can be used to annotate the panels in a multi-panel lattice plot?
a) points()
b) panel.abline()
c) lines()
d) axis()
Answer
Answer: b [Reason:] panel.abline() is one of the most used panel function.
4. ____________ produces one-dimensional scatterplots.
a) xyplot
b) stripplot
c) barchart
d) bwplot
Answer
Answer: b [Reason:] This function along with other high-level Lattice functions, respond to a common set of arguments.
5. which of the following functions can be used to finely control the appearance of all lattice plots ?
a) par()
b) print.trellis()
c) splom()
d) trellis.par.set()
Answer
Answer: d [Reason:] All high-level function in lattice are generic.
6. What is ggplot2 an implementation of ?
a) the Grammar of Graphics developed by Leland Wilkinson
b) 3D visualization system
c) the S language originally developed by Bell Labs
d) the base plotting system in R
Answer
Answer: a [Reason:] The ggplot2 package, created by Hadley Wickham, offers a powerful graphics language for creating elegant and complex plots.
7. For barchart and _________, non-trivial methods exist for tables and arrays, documented at barchart.table.
a) scatterplot
b) dotplot
c) xyplot
d) all of the mentioned
Answer
Answer: b [Reason:] The numeric methods are equivalent to a call with no left hand side and no conditioning variables in the formula.
8. What is a geom in the ggplot2 system ?
a) a plotting object like point, line, or other shape
b) a method for making conditioning plots
c) a method for mapping data to attributes like color and size
d) a statistical transformation
Answer
Answer: a [Reason:] The bar geom is used to produce 1d area plots.
9. Logical flag is applicable to which of the following plots ?
a) scatterplot
b) dotplot
c) xyplot
d) all of the mentioned
Answer
Answer: b [Reason:] Logical flag applicable to bwplot, dotplot, barchart, and stripplot.
10. ___________ is used to determine what is plotted for each group.
a) panel.expose
b) panel.impose
c) panel.superpose
d) all of the mentioned
Answer
Answer: c [Reason:] panel.superpose can be combined with different panel.groups functions.
Interview MCQ Set 2
1. Amazon EMR also allows you to run multiple versions concurrently, allowing you to control your ___________ version upgrade.
a) Pig
b) Windows Server
c) Hive
d) Ubuntu
Answer
Answer: c [Reason:] Amazon EMR supports several versions of Hive, which you can install on any running cluster.
2. Point out the correct statement :
a) Amazon Elastic MapReduce (Amazon EMR) provides support for Apache Hive.
b) Pig extends the SQL paradigm by including serialization formats and the ability to invoke mapper and reducer scripts
c) The Amazon Hive default input format is text
d) All of the mentioned
Answer
Answer: a [Reason:] With Hive 0.13.1 on Amazon EMR, certain options introduced in previous versions of Hive on EMR have been removed in favor of greater parity with Apache Hive. For example, the -x option was removed.
3. The Amazon EMR default input format for Hive is :.
a) org.apache.hadoop.hive.ql.io.CombineHiveInputFormat
b) org.apache.hadoop.hive.ql.iont.CombineHiveInputFormat
c) org.apache.hadoop.hive.ql.io.CombineFormat
d) All of the mentioned
Answer
Answer: a [Reason:] You can specify the hive.base.inputformat option in Hive to select a different file format,
4. Hadoop clusters running on Amazon EMR use ______ instances as virtual Linux servers for the master and slave nodes
a) EC2
b) EC3
c) EC4
d) None of the mentioned
Answer
Answer: a [Reason:] Amazon EMR has made enhancements to Hadoop and other open-source applications to work seamlessly with AWS.
5. Point out the wrong statement :
a) Apache Hive saves Hive log files to /tmp/{user.name}/ in a file named hive.log
b) Amazon EMR saves Hive logs to /mnt/var/log/apps/
c) In order to support concurrent versions of Hive, the version of Hive you run determines the log file name
d) None of the mentioned
Answer
Answer: d [Reason:] If you have many GZip files in your Hive cluster, you can optimize performance by passing multiple files to each mapper.
6. Amazon EMR uses Hadoop processing combined with several __________ products.
a) AWS
b) ASQ
c) AMR
d) AWES
Answer
Answer: a [Reason:] Amazon Elastic MapReduce (Amazon EMR) is a web service that makes it easy to process large amounts of data efficiently.
7. ___________ is an RPC framework that defines a compact binary serialization format used to persist data structures for later analysis.
a) Pig
b) Hive
c) Thrift
d) None of the mentioned
Answer
Answer: b [Reason:] Amazon EMR does not support Hive Authorization.
8. Impala on Amazon EMR requires _________ running Hadoop 2.x or greater.
a) AMS
b) AMI
c) AWR
d) All of the mentioned
Answer
Answer: b [Reason:] Impala is an open source tool in the Hadoop ecosystem for interactive, ad hoc querying using SQL syntax.
9. Impala executes SQL queries using a _________ engine.
a) MAP
b) MPP
c) MPA
d) None of the mentioned
Answer
Answer: b [Reason:] Impala avoids Hive’s overhead from creating MapReduce jobs, giving it faster query times than Hive.
10. Amazon EMR clusters can read and process Amazon _________ streams directly.
a) Kinet
b) kinematics
c) Kinesis
d) None of the mentioned
Answer
Answer: c [Reason:] The Amazon EMR connector for Amazon Kinesis uses the DynamoDB database as its backing for checkpointing metadata.
Interview MCQ Set 3
1. Which of the following is a primitive data type in Avro ?
a) null
b) boolean
c) float
d) all of the mentioned
Answer
Answer: d [Reason:] Primitive type names are also defined type names.
2. Point out the correct statement :
a) Records use the type name “record” and support three attributes
b) Enum are represented using JSON arrays
c) Avro data is always serialized with its schema
d) All of the mentioned
Answer
Answer: a [Reason:] A record is encoded by encoding the values of its fields in the order that they are declared.
3. Avro supports ______ kinds of complex types.
a) 3
b) 4
c) 6
d) 7
Answer
Answer: d [Reason:] Avro supports six kinds of complex types: records, enums, arrays, maps, unions and fixed.
4.________ are encoded as a series of blocks.
a) Arrays
b) Enum
c) Unions
d) Maps
Answer
Answer: a [Reason:] Each block of array consists of a long count value, followed by that many array items. A block with count zero indicates the end of the array. Each item is encoded per the array’s item schema.
5. Point out the wrong statement :
a) Record, enums and fixed are named types
b) Unions may immediately contain other unions
c) A namespace is a dot-separated sequence of such names
d) All of the mentioned
Answer
Answer: b [Reason:] Unions may not immediately contain other unions.
6. ________ instances are encoded using the number of bytes declared in the schema.
a) Fixed
b) Enum
c) Unions
d) Maps
Answer
Answer: a [Reason:] Except for unions, the JSON encoding is the same as is used to encode field default values.
7. ________ permits data written by one system to be efficiently sorted by another system.
a) Complex Data type
b) Order
c) Sort Order
d) All of the mentioned
Answer
Answer: c [Reason:] Avro binary-encoded data can be efficiently ordered without deserializing it to objects.
8. _____________ are used between blocks to permit efficient splitting of files for MapReduce processing.
a) Codec
b) Data Marker
c) Syncronization markers
d) All of the mentioned
Answer
Answer: c [Reason:] Avro includes a simple object container file format.
9. The __________ codec uses Google’s Snappy compression library.
a) null
b) snappy
c) deflate
d) none of the mentioned
Answer
Answer: b [Reason:] Snappy is a compression library developed at Google, and, like many technologies that come from Google, Snappy was designed to be fast.
10. Avro messages are framed as a list of _________
a) buffers
b) frames
c) rows
d) none of the mentioned
Answer
Answer: b [Reason:] Framing is a layer between messages and the transport. It exists to optimize certain operations.
Interview MCQ Set 4
1. Which of the following transport protocol is used for POP3 and IMAP access in Gmail ?
a) TTP
b) TLS
c) HTTP
d) All of the mentioned
Answer
Answer: b [Reason:] TLS stands for Transport Layer Security.
2. Point out the wrong statement:
a) Gmail is written to look like an Internet chat utility
b) Gmail tends to perform most of its operations such as archiving on the conversation as a whole
c) When a conversation gets to be 50 messages long, Gmail splits the conversation into a second section
d) All of the mentioned
Answer
Answer: c [Reason:] When a conversation gets to be 100 messages long, Gmail splits the conversation into a second section.
3. Which of the following prototypical POP3 Webmail mail retrieval service was established in 1997 ?
a) Gmail
b) Mail2Web
c) Yahoo Mail
d) All of the mentioned
Answer
Answer: b [Reason:] From the Mail2Web interface, you can read messages, reply, and create new messages.
4. Mail2Web also has a mobile e-mail service based on Exchange called __________ Mobile E-mail.
a) Mail2.com
b) Mail2Web.com
c) MailWeb.com
d) All of the mentioned
Answer
Answer: b [Reason:] Mail2Web.com provides online access to any POP3 account.
5. Point out the wrong statement:
a) In Gmail, you cannot construct searches with multiple operators using the Advanced Search feature
b) Google uses an advertiser-driven model to supply its free service to users
c) RSS content can be read in most modern browsers
d) None of the mentioned
Answer
Answer: a [Reason:] In its search function, Google uses the keywords from your search and your search history to match sponsors to you.
6. Which of the following is Microsoft’s Webmail offering and with localized versions in 36 languages ?
a) AOL Mail
b) Windows Live Hotmail
c) Yahoo Mail
d) None of the mentioned
Answer
Answer: b [Reason:] Windows Live Hotmail is one of the central applications in Microsoft Windows Live product portfolio.
7. Which of the following browser is not supported for Hotmail Service ?
a) Firefox
b) Safari
c) Chrome
d) All of the mentioned
Answer
Answer: b [Reason:] Windows Live Hotmail was created with Ajax technology.
8. Which of the following Hotmail feature has ability to set a spam filter directly with your mouse ?
a) 1-Click Filters
b) 2-Click Filters
c) Mail Filers
d) All of the mentioned
Answer
Answer: a [Reason:] Hotmail has a strong feature set.
9. Which of the following adds the ability to synchronize messages, contacts, and calendars on any mobile phone that has ActiveSync ?
a) Exchange ActiveSync
b) Exchange PassiveSync
c) EmailRackspace
d) All of the mentioned
Answer
Answer: a [Reason:] Hotmail also can be viewed in Microsoft Office Outlook using the Outlook Connector or using Windows Live for Windows Mobile phones on that phone’s operating system.
10. Yahoo! added a version of the user interface based on Ajax that looks like a form of _____________
a) Gmail
b) Microsoft Outlook
c) AOL Mail
d) None of the mentioned
Answer
Answer: b [Reason:] This Ajax interface is based on the work of Oddpost, which the company acquired in 2004.
Interview MCQ Set 5
1. __________ is used when you have variables that form rows instead of columns.
a) tidy()
b) spread()
c) separate()
d) all of the mentioned
Answer
Answer: b [Reason:] You need spread() less frequently than gather() or separate().
2. Point out the correct statement :
a) tidyr and dplyr packages do not make use of the pipe operator
b) tidyr does less than reshape2
c) tidyr provides ability to string multiple functions together by incorporating %
d) all of the mentioned
Answer
Answer: b [Reason:] Just as reshape2 did less than reshape, tidyr does less than reshape2.
3. Which of the following merges two variables into one ?
a) spread()
b) gather()
c) separate()
d) unite()
Answer
Answer: b [Reason:] The unite() function is a convenience function to paste together multiple variable values into one.
4. How many functions exist for wrangling the data with dplyr package ?
a) one
b) seven
c) three
d) five
Answer
Answer: b [Reason:] dplyr provides seven main functions for tidying your messy data.
5. Point out the correct statement :
a) gather() makes “lond” data wider
b) tidyr is a reframing of reshape designed to accompany the tidy data framework
c) there are two fundamental verbs of data tidying
d) none of the mentioned
Answer
Answer: c [Reason:] In particular, built-in methods only work for data frames, and tidyr provides no margins or aggregation.
6. ________ function is similar to the existing subset() function in R
a) rename
b) filter
c) set
d) subset
Answer
Answer: b [Reason:] The filter() function is used to extract subsets of rows from a data frame.
7. The ______ operator allows you to string operations in a left-to-right fashion
a) %>%>
b) %>%
c) >%>%
d) all of the mentioned
Answer
Answer: b [Reason:] The pipeline operater %>% is very handy for stringing together multiple dplyr functions in a sequence of operations.
8. ________ add new variables/columns or transform existing variables
a) mutate
b) add
c) apped
d) arrange
Answer
Answer: a [Reason:] arrange is used to reorder rows of a data frame.
9. _________ extract a subset of rows from a data frame based on logical conditions.
a) rename
b) filter
c) set
d) subset
Answer
Answer: a [Reason:] rename is used to rename variables in a data frame.
10. dplyr can be integrated with the ________ package for large fast tables
a) data.table
b) read.table
c) data.data
d) none of the mentioned
Answer
Answer: a [Reason:] The dplyr package is handy way to both simplify and speed up your data frame management code.