Hadoop summit socialize_v1.0

32
Copyright © 2013 Splun Inc. Big Data at the Speed of Business Isaac Mosquera Director of Mobile, ShareThis Clint Sharp Principal Big Data Product Manager, Splunk Copyright © 2013 Splun Inc.

description

 

Transcript of Hadoop summit socialize_v1.0

Page 1: Hadoop summit socialize_v1.0

Copyright © 2013 Splunk Inc.

Big Data at the Speed of Business

Isaac Mosquera Director of Mobile, ShareThis

Clint SharpPrincipal Big Data Product Manager, Splunk

Copyright © 2013 Splunk Inc.

Page 2: Hadoop summit socialize_v1.0

What We’ll Talk About

Our quest for visibility

Analyzing at scale

Splunk and Big Data

Where do you start?

Q&A

Page 3: Hadoop summit socialize_v1.0

About Splunk

Company (NASDAQ: SPLK)Founded 2004, first software release in 2006

HQ: San Francisco

Business Model / ProductsIndustry-leading machine data platform

On-premise, in the cloud and SaaS

5,600+ Customers63 of the Fortune 100

Largest license: 100 Terabytes per day

#1 Big Data Innovator*

* Fast Company's Most Innovative Companies Issue (March 2013)

Page 4: Hadoop summit socialize_v1.0

About ShareThis and Socialize

ShareThis makes the world more connected, trusted and valuable through sharing

Powers the social web, touching the lives of 95 percent of U.S.

Acquires Socialize, which makes mobile and social more engaging

Socialized integrated into thousands of iOS and Android Apps

Installed on 80M+ devices

Page 5: Hadoop summit socialize_v1.0

Evaluating 20 Billion Ad Impressions Monthly

Page 6: Hadoop summit socialize_v1.0

Ad Request RTB

Ad Request

Socialize Bidder

Bid ResponseWinning Bidder's Ad

Ad Impression

Ad Click

Little Bit About Real-Time Bidding

All this needs to happen in less than 100 milliseconds!

Page 7: Hadoop summit socialize_v1.0

So What Are Some of the Problems?

Decision Making (Bid Algorithms)

Ingesting more than 10,000 queries per secondWhich bids are > 100msQuickly finding any errors within the system

Campaign spendingCampaign efficiencyDissect data by:– apps– users– devices

Operational

Page 8: Hadoop summit socialize_v1.0

Analyzing Big Data Efficiently

1. 2. 3. 4.

Collection Storage RetrievalAnalyzation/Aggregation

Page 9: Hadoop summit socialize_v1.0

Some Options

SQL functions like count() presents problems at scale

Write operations too high for a single DB, as well as a single point of failure

Would work well for high inserts and queries, however we would need to build alerting, charting and reporting dashboards

Easy to setup and query using Hive however we would have to setup a new environments and learn new technology

RDBMS

RDBMS

NoSQL

Hadoop

Page 10: Hadoop summit socialize_v1.0

Easily identify problems and prevent erroneous spending. When an alert goes off we hit a script which shuts off the bidder.

Allows us to find patterns in the data to improve our bid algorithms

Instantly know campaign metrics for us and our clients

Adding new RTB Service providers means billions of new ad requests. Scaling horizontally is key

Operational Reporting

AdHoc Queries

Application Reporting

Scalability

Splunk Fits the Bill

Page 11: Hadoop summit socialize_v1.0

Analysis/Aggregationindex=ad_events displayed_ad| bin _time span=1m| stats count(meta.displayed_ad) as displays sum(price/1000) as dollars_spent avg(price) as avg_cpm_price by campaign_id _time| mysqloutput spec=ads-prod table=ads_analytics insert="campaign_id, stat_date, displays, dollars_spent, avg_cpm_price"

RDBMS(Generated Reports)

SearchHead

Indexer

Indexer

Indexer

Page 12: Hadoop summit socialize_v1.0

Interactive analysis with Search Processing Language:

Using Splunk to Analyze Operational Data

Easily digest information through charts

source="nginx-prod.log" | stats avg(ResponseTime) as avg_rtime, p95(ResponseTime) as p95_rtime , stdev(ResponseTime) as stdev_rtime

Page 13: Hadoop summit socialize_v1.0

Final Architecture

RDBMS (Generated

Reports)S3 Snapshots

SearchHead

Socialize Bidder

SplunkIndexer

Indexer

Indexer

Cache Cluster

Memcache Memcache Memcache

Page 14: Hadoop summit socialize_v1.0

So, What is Splunk?

14

Page 15: Hadoop summit socialize_v1.0

Expanding Universe of Data Sources

Machine-generated DataBusiness Application Data Human-generated Data

Highly Structured Arbitrarily Structured

2012-12-05 07:04:44 Id=00Q000000Rd910EAJ City=New York Country=US CreatedDate=“2012-12-05 07:06:44” [email protected] Email_Opt_In_c Customer_Street_Address_c=“123 Main St.” purchased_product_id=product_i BD-01 twitter_username john_t_doe

Page 16: Hadoop summit socialize_v1.0

Industry Leading Platform for Machine Data

Any Machine Data Operational Intelligence

HA Indexes and Storage

CommodityServers

DeveloperPlatform

Custom dashboards

Monitor and alert

Ad hoc search

Report and analyze

Page 17: Hadoop summit socialize_v1.0

Analyzing Heterogeneous Data

Universal Index Schema-on-the-fly Flexibility and Fast Time to Value

• No data normalization• Automatically handles

timestamps• Parsers not required• Index every term &

pattern “blindly”• No attempt to

“understand” up front

• Structure applied at search-time

• No brittle schema to work around

• Automatically find transactions, patterns and trends

• Normalization as it’s needed

• Faster implementation• Easy search language• Multiple views into the

same data

Page 18: Hadoop summit socialize_v1.0

Gain Critical Insights … in Real-timeOrder ID

Customer’s Tweet

Time Waiting On Hold

Product ID

Company’s Name

Sources

Twitter

Care IVR

Middleware Error

Order Processing

Order ID

Customer ID

Twitter ID

Customer ID

Customer ID

Page 19: Hadoop summit socialize_v1.0

Deep Visibility and Insight for IT and Business

IT Operations Management Web Intelligence

Business AnalyticsApplication Management

Security and Compliance Industrial Data / Internet of Things

Over 5,600 organizations using Splunk across IT and business users

Page 20: Hadoop summit socialize_v1.0

Driving Insights from Big Data

Page 21: Hadoop summit socialize_v1.0

Hadoop

The ShareThis Insights Platform

On Father’s day:“Who were the most shared about topics?”“What type of type of beers do people drink?”

API ETL Pre-aggregation Analytics

?

Page 22: Hadoop summit socialize_v1.0

Finding the Optimal Approach

Hadoop and MapReduce are great for complex data science on data at rest – the previous architecture took 9 months with a team of engineers, data architects, etc.The Splunk platform delivers real-time, interactive analysis – we can build many of the same insights within 1 hour

What should be the core focus or competency of your team?

Conclusion: find the most optimal approach for the business

Page 23: Hadoop summit socialize_v1.0

What About Ad Hoc Analysis?

Page 24: Hadoop summit socialize_v1.0

PR Insights ExampleWhat was the situation? (e.g. fast moving business, needed real-time insights)What was the PR team struggling with? Difficult to find useful data to build interesting use-casesWhat did they want? They wanted a flexible real-time reporting environment to extract insights useful for the marketHow my team helped? Delivered a single dashboard that contained real-time data into the sharing behaviors across our network

Page 25: Hadoop summit socialize_v1.0

PR Insights Dashboard

Page 26: Hadoop summit socialize_v1.0

Let’s not forgetThe low-hanging fruit

Page 27: Hadoop summit socialize_v1.0

Operational Analytics for an Online World

website

API NotificationGoogle (GCM)

FeedbackProcessor

Apple (APNS)

? !

Notifications Systems

Driving Superior Customer Experience

How many 500 errors have I had over time?

Look for anomalies and spikes!

Zone in directly to the customer!!

Online Device Notifications

Page 28: Hadoop summit socialize_v1.0

One More Thing …

28

Page 29: Hadoop summit socialize_v1.0

Copyright © 2013 Splunk Inc.

New product from Splunk delivers interactive data exploration, analysis and visualizations for Hadoop

Announcing Hunk BetaSplunk Analytics for Hadoop

Page 30: Hadoop summit socialize_v1.0

Derive Actionable Insights from Raw Data

HadoopStorage

Immediately start exploring, analyzing and visualizing raw data in Hadoop

1 2Point Splunk at Hadoop Cluster

Explore Analyze Visualize Dashboards Share

Page 31: Hadoop summit socialize_v1.0

Learn More

31

splunk.com/bigdata

Page 32: Hadoop summit socialize_v1.0

Copyright © 2013 Splunk Inc.

Questions?