Self-Service Analytics on Hadoop: Lessons Learned

Post on 16-Apr-2017

801 views 3 download

Transcript of Self-Service Analytics on Hadoop: Lessons Learned

Self-Service Analytics on Hadoop: Lessons Learned

June 29, 2016Drew LeamonDirector – Advanced Technology Solutions

Comcast: Shaping the Future of Media and Technology

High Speed Internet

Forecast

Engineering Design

Budget

Engineering Analysis: Global Central Analysis Team

Animals are Best Suited in Their Native Habitat

Spreadsheets: The Natural Habitat of Analysts

Evolution of Self Service Analytics

SSRS

Self Service: Native HabitatLimitations of the Spreadsheet Native Habitat

• 1 Million Row Max

Self Service• Not Even Medium Data• Not Collaborative• No Automation• Not Repeatable

IT Analyst

Self Service: How We StartedAnalyst goes to IT, makes request, waited weeks to get results

SSRS

• 10 TB Storage • 1 Compute Node

Not Self Service• 10 TB (Medium Data)• Limited Compute• IT Hand-off• Consultative service• Not self service.

IT Analysts

Bigger database still meant building dashboards for team

IT Analysts

Still Not Self Service• 100s TBs (Large Data)• Data silos• IT Hand-off• Consultative service• Analysts not SQL

experts

Graduated to Specialized Databases

• Clustered Storage• Columnar Compression• Clustered Compute

Datameer, native on Hadoop, enables self-service for big data

Analysts

True Self Service• PB == Big Data• Data Lake • Excel-like UI• No more waiting for IT

Self Service: The New Way

• Clustered Storage• Columnar Compression• Clustered Compute• Liberated Data

11

Multiple Configurations for Big Data

12

Engineering Analysis

IP Telephony

Video Research

IP Video Engineering

X1 Operations

Advanced Advertising

Web Analytics

Enterprise Business

Intelligence

Network EngineeringMature

Evolving

On-Boarded

On-Deck

Expanding Use Cases with Datameer

Use Case #1: Comcast Digital Voice

One Of The Largest IP Telephony Networks

Anonymized Call Detail Records (CDR) Data Set

Data complexity from networkData size: TBs/month

Discovered Unusual PatternsNoticed large spikes for high cost areas

Hypothesis: Network Abuse

30% of this traffic was coming from three accounts. 

Analysis Shows Traffic Concentration Few Accounts

Ongoing Monitoring of Future Abuse

Analyst Scheduled a Tableau Data Extract and built a Tableau dashboard- Now the business can keep an eye out for further abuse.

Result: Future Abuse Prevented and More

Abuse detected Analysts empowered Resources saved

No IT hand-off Value to organizationAutomated and repeatable

21

Engineering Analysis

IP Telephony

Video Research

IP Video Engineering

X1 Operations

Advanced Advertising

Web Analytics

Enterprise Business

Intelligence

Network EngineeringMature

Evolving

On-Boarded

On-Deck

Expanding Use Cases with Datameer

22

Use Case #2: Customer PerspectiveHow to measure customer experience from the customer perspective

23

Millions of Viewing Experiences

24

Improved Customer Experience through Data Analytics

Findings / Analysis

Best Practices

Improved Customer Experience

Data driven schedulingDataflow Automation

Solution:

25

- Build views quickly & aggregate large datasets.

- Early visibility of data in Hadoop

Analyze

- Create repeatable processes through automated workflow

• Aggregations of large datasets from disparate data sources.- RDBMS, HDFS, APIs

• Data Joins / Data Quality Checks / Pipeline between clusters

Blend

Share

Insights

26

Result: Data-driven Customer Viewing Experience Enhancements

Customer Experience Improved

Analysts empowered Capital Spend Directed Intelligently

No IT hand-off Value to organizationAutomated and repeatable