Pass bac jd_sm

20
April 10-12 | Chicago, IL Using Power View and Hive to Gain Business Insights Finding Hidden Answers in Data Joey D’Antoni, Comcast Cable Stacia Misner, Data Inspirations

Transcript of Pass bac jd_sm

Page 1: Pass bac jd_sm

April 10-12 | Chicago, IL

Using Power View and Hiveto Gain Business InsightsFinding Hidden Answers in Data

Joey D’Antoni, Comcast CableStacia Misner, Data Inspirations

Page 2: Pass bac jd_sm

April 10-12 | Chicago, IL

Please silence cell phones

Page 3: Pass bac jd_sm

3

About Us

• Principal Architect for SQL Server at Comcast Cable

• @jdanton on Twitter• Joedantoni.wordpress.com

Joey D’Antoni Stacia Misner

• Principal Consultant at Data Inspirations• @StaciaMisner on Twitter• blog.datainspirations.com

Page 4: Pass bac jd_sm

4

Agenda

• Introducing Big Data• Overview and Summary of Data Set• Insights into the Data• Conclusions

Page 5: Pass bac jd_sm

5

Classic Data Analysis

Loading Analyzing Visualization

D'Antoni, Joseph
Page 6: Pass bac jd_sm

Classic Data Analysis

Data Warehouse & BI Solutions

ETL

…Uses Just a Subset

Page 7: Pass bac jd_sm

Classic Data Analysis

Data Warehouse & BI Solutions

ETL

…Requires Structure

Page 8: Pass bac jd_sm

8

Why Leave the RDBMS

Page 9: Pass bac jd_sm

Key Differences

Scale Out As NeededWith Commodity

Hardware

Impose Schema On Read

Basically

Available

Soft-state

Eventually consistent

Page 10: Pass bac jd_sm

Hadoop Ecosystem

HDFS

MapReduce

Note: This is only a subset of ecosystem!

Page 11: Pass bac jd_sm

11

Hadoop and Hive Demo

Page 12: Pass bac jd_sm

12

Extract, Transform, Load (ETL) Process

Some Database Business Doesn’t

Care About

Process

Your

Some

Credit—Buck Woody, Microsoft

Page 13: Pass bac jd_sm

13

Our ETL Process

Collection Server

HDFS

Hive is a Data Warehouse System that connects to Hadoop and allows SQL queries to be written against data sets in Hadoop

Page 14: Pass bac jd_sm

The Data Set

Set Top Box Engagement Times

• Max Set Top Boxes Viewing Channels• Aggregate Viewing Seconds• Potential Total Seconds Watched• Recorded in 5, 15 and 60 minute aggregates

This data is from the week of 11-17, July 201214

Page 15: Pass bac jd_sm

15

Preparation for Data Analysis

• Define question to answer

• Define ideal data set

• Find data

Page 16: Pass bac jd_sm

16

Remember Legal and Privacy Issues

Page 17: Pass bac jd_sm

17

Diving into Data Analysis

• Cleanse• Reformat as needed • Decide what is usable

• Explore• Create summaries• Perform statistical analysis• Use visualizations

Page 18: Pass bac jd_sm

18

Aggregate Statistics on Data

Page 19: Pass bac jd_sm

19

Resources

Connecting Excel to Hive (Hive ODBC Driver, Excel Hive Add-in)• http://social.technet.microsoft.com/wiki/contents/articles/6226.ho

w-to-connect-excel-to-hadoop-on-azure-via-hiveodbc.aspx

Connecting PowerPivot to Hadoop on Azure• http://dennyglee.com/2012/01/21/connecting-powerpivot-to-hadoo

p-on-azure-self-service-bi-to-big-data-in-the-cloud/

Connecting Power View to Hadoop on Azure• http

://dennyglee.com/2012/02/10/connecting-power-view-to-hadoop-on-azurean-awesomesauce-way-to-view-big-data-in-the-cloud/

Page 20: Pass bac jd_sm

April 10-12 | Chicago, IL

Thank you!Diamond Sponsor