Spark Week5 Quiz3 v3

3
7/23/2019 Spark Week5 Quiz3 v3 http://slidepdf.com/reader/full/spark-week5-quiz3-v3 1/3 12/27/2015 Spark SQL and Hive | Coursera https://www.coursera.org/learn/bigdata-analytics/exam/ZQuyq/spark-sql-and-hive Spark SQL and Hive 6 questions 1. What makes DataFrames and Database tables conceptually equivalent? Both support any number of rows They are collections of rows with typed columns They are collections of columns with typed rows 2. What is the functionality of registerTempTable? Save a temporary table to Hive Prepare a temporary database table interface for a DataFrame Save a DataFrame to Hive Save a temporary table to HDFS 3. What is the most efficient interface to analyze data with DataFrames and why? Either DataFrame calls or SQL are equally efficient because they feed to the same optimizer. DataFrame calls are native to Spark so are more efficient.

Transcript of Spark Week5 Quiz3 v3

Page 1: Spark Week5 Quiz3 v3

7/23/2019 Spark Week5 Quiz3 v3

http://slidepdf.com/reader/full/spark-week5-quiz3-v3 1/3

12/27/2015 Spark SQL and Hive | Coursera

https://www.coursera.org/learn/bigdata-analytics/exam/ZQuyq/spark-sql-and-hive

Spark SQL and Hive

6 questions

1.

What makes DataFrames and Database tables conceptually equivalent?

Both support any number of rows

They are collections of rows with typed columns

They are collections of columns with typed rows

2.

What is the functionality of registerTempTable?

Save a temporary table to Hive

Prepare a temporary database table interface for a DataFrame

Save a DataFrame to Hive

Save a temporary table to HDFS

3.

What is the most efficient interface to analyze data with DataFrames and

why?

Either DataFrame calls or SQL are equally efficient because they

feed to the same optimizer.

DataFrame calls are native to Spark so are more efficient.

Page 2: Spark Week5 Quiz3 v3

7/23/2019 Spark Week5 Quiz3 v3

http://slidepdf.com/reader/full/spark-week5-quiz3-v3 2/3

12/27/2015 Spark SQL and Hive | Coursera

https://www.coursera.org/learn/bigdata-analytics/exam/ZQuyq/spark-sql-and-hive

Either DataFrame calls or SQL are equally efficient because

DataFrame calls are translated to SQL under the hood.

SQL is more efficient because is more low level, therefore there

is less overhead.

4.

Why would you want to use the SQL interface instead of DataFrame calls?

Check all the multiple options that apply

In a PySpark shell it is a lot easier to debug SQL than DataFrame

calls

My analysis is written more easily in SQL

Have already SQL code from a previous application

It is more efficient

5.

How to setup Spark so it can connect to Hive?

Copy hive-site.xml to Spark's conf folder

Configure Hive properties on the SparkContext object sc

Open pyspark shell with --hive argument

6.

Which of these objects are persistent across different PySpark shell

instances (i.e. close shell and restart it again)?

DataFrames registered with registerTempTable

DataFrames saved to Hive with saveAsTable

DataFrames cached in memory with .cache

Page 3: Spark Week5 Quiz3 v3

7/23/2019 Spark Week5 Quiz3 v3

http://slidepdf.com/reader/full/spark-week5-quiz3-v3 3/3

12/27/2015 Spark SQL and Hive | Coursera

https://www.coursera.org/learn/bigdata-analytics/exam/ZQuyq/spark-sql-and-hive

DataFrames

Submit Quiz