Pass bac jd_sm

Post on 28-May-2015

658 views 0 download

Tags:

Transcript of Pass bac jd_sm

April 10-12 | Chicago, IL

Using Power View and Hiveto Gain Business InsightsFinding Hidden Answers in Data

Joey D’Antoni, Comcast CableStacia Misner, Data Inspirations

April 10-12 | Chicago, IL

Please silence cell phones

3

About Us

• Principal Architect for SQL Server at Comcast Cable

• @jdanton on Twitter• Joedantoni.wordpress.com

Joey D’Antoni Stacia Misner

• Principal Consultant at Data Inspirations• @StaciaMisner on Twitter• blog.datainspirations.com

4

Agenda

• Introducing Big Data• Overview and Summary of Data Set• Insights into the Data• Conclusions

5

Classic Data Analysis

Loading Analyzing Visualization

D'Antoni, Joseph

Classic Data Analysis

Data Warehouse & BI Solutions

ETL

…Uses Just a Subset

Classic Data Analysis

Data Warehouse & BI Solutions

ETL

…Requires Structure

8

Why Leave the RDBMS

Key Differences

Scale Out As NeededWith Commodity

Hardware

Impose Schema On Read

Basically

Available

Soft-state

Eventually consistent

Hadoop Ecosystem

HDFS

MapReduce

Note: This is only a subset of ecosystem!

11

Hadoop and Hive Demo

12

Extract, Transform, Load (ETL) Process

Some Database Business Doesn’t

Care About

Process

Your

Some

Credit—Buck Woody, Microsoft

13

Our ETL Process

Collection Server

HDFS

Hive is a Data Warehouse System that connects to Hadoop and allows SQL queries to be written against data sets in Hadoop

The Data Set

Set Top Box Engagement Times

• Max Set Top Boxes Viewing Channels• Aggregate Viewing Seconds• Potential Total Seconds Watched• Recorded in 5, 15 and 60 minute aggregates

This data is from the week of 11-17, July 201214

15

Preparation for Data Analysis

• Define question to answer

• Define ideal data set

• Find data

16

Remember Legal and Privacy Issues

17

Diving into Data Analysis

• Cleanse• Reformat as needed • Decide what is usable

• Explore• Create summaries• Perform statistical analysis• Use visualizations

18

Aggregate Statistics on Data

19

Resources

Connecting Excel to Hive (Hive ODBC Driver, Excel Hive Add-in)• http://social.technet.microsoft.com/wiki/contents/articles/6226.ho

w-to-connect-excel-to-hadoop-on-azure-via-hiveodbc.aspx

Connecting PowerPivot to Hadoop on Azure• http://dennyglee.com/2012/01/21/connecting-powerpivot-to-hadoo

p-on-azure-self-service-bi-to-big-data-in-the-cloud/

Connecting Power View to Hadoop on Azure• http

://dennyglee.com/2012/02/10/connecting-power-view-to-hadoop-on-azurean-awesomesauce-way-to-view-big-data-in-the-cloud/

April 10-12 | Chicago, IL

Thank you!Diamond Sponsor