Treasure Data: Big Data Analytics on Heroku

Post on 28-May-2015

2.235 views 3 download

Tags:

description

2012年12月6日 Cloudforce Japan Developer Zone内のシアターで講演された資料です。

Transcript of Treasure Data: Big Data Analytics on Heroku

Treasure Data:Big Data Analytics on HerokuMuga Nishizawa, Chief Software Architect

Muga Nishizawa (@muga_nishizawa)Chief Software Architect, Treasure Data

3

Treasure Data Overview Founded to deliver big data analytics in days not months without

specialist IT resources for one-tenth the cost of other alternatives Service based subscription business model World class open source team

• Founded world’s largest Hadoop User Group• Developed Fluentd and MessagePack• Contributed to Memcached, Hibernate, etc.

Treasure Data is in production• 20 customers incl. Fortune 500 companies• 100+ billion records stored

Processing 10,000 messages per second

4

Our Customers – Fortune Global 500 leaders and start-ups including:

5

One Hundred Billion Records and Growing!

120

100

80

60

40

20Sep2011

Nov2011

Jan2012

Mar2012

May2012

Jul2012

Aug2012

6

Treasure Data Service“Store Your Data Now for Future Insights”

7

Treasure Data Service

UserApache

App

App

Other data sources

RDBMSTreasure Data

columnar data storage

QueryProcessingCluster

Query API

HIVE, PIG (to be supported)

JDBC, REST

MAPREDUCE JOBS

User

td-command

BI apps

“Store Your Data Now for Future Insights”

User

8

Apache

App

App

Other data sources

RDBMSTreasure Data

columnar data storage

QueryProcessingCluster

Query API

HIVE, PIG (to be supported)

JDBC, REST

MAPREDUCE JOBS

User

td-command

BI apps

2012-02-04 01:33:51myappdb.buylog { “user”: ”12345”, “path”: “/buyItem”, “price”: 150, “referer”: “/landing”}

Treasure Data Service“Store Your Data Now for Future Insights”

User

9

Apache

App

App

Other data sources

RDBMSTreasure Data

columnar data storage

QueryProcessingCluster

Query API

HIVE, PIG (to be supported)

JDBC, REST

MAPREDUCE JOBS

User

td-command

BI apps

$ td query -w -d myappdb \ "SELECT \ TD_TIME_FORMAT(time, "yyyy-MM-dd", "PDT") AS day, \ COUNT(1) AS cnt \ FROM buylog \ GROUP BY TD_TIME_FORMAT(time, "yyyy-MM-dd", "PDT") \ ORDER BY cnt"

Treasure Data Service“Store Your Data Now for Future Insights”

Apache

10

App

App

Other data sources

RDBMSTreasure Data

columnar data storage

QueryProcessingCluster

Query API

HIVE, PIG (to be supported)

JDBC, REST

MAPREDUCE JOBS

User

td-command

BI apps

+------------+------+| day | cnt |+------------+------+| 2012-05-26 | 4981 || 2012-05-27 | 4481 || 2012-05-28 | 481 |+------------+------+

User

Treasure Data Service“Store Your Data Now for Future Insights”

11

Comparing On-Premise & Cloud Big Data Mkts

On-Premise

Cloud

Data Volume

Database-as-a-

Service

Big Data-as-a-Service

Low High

Data Warehouse

Traditional DBMS

(ODS, Data Mart) Hadoop

© 2012 Forrester Research, Inc. Reproduction Prohibited

Treasure Data as Heroku Add-on

12

Demo with Heroku

13

Synergy Effect for Data-Driven Development!

10

14

×

The Power of the Cloud

Easier to ScaleEasier to MaintainEasier to Iterate

11

15

Implementation ProcessTraditional DW and On-Premise Big Data

16

Implementation ProcessTraditional DW and On-Premise Big Data

Dramatically streamlinedImplementation process

17

Heroku×

Treasure Data

Viki.com: “Global Hulu”

14

18

Viki Before

Hard to manage Hadoop Complicated data collection

19

Viki After

No more Hadoop maintenance Versatile data collector, td-agent

20

Please Try It!

21

How Does It Work?

22

Query ProcessingQuery Language

Query Execution

Columnar Data

Object Storage

23

1/4: Compile SQL into MapReduce

SELECT COUNT(DISTINCT ip) FROM tbl;

24

2/4: MapReduce is executed in parallel

cc2.8xlarge cluster compute instance (up to 100 nodes * 32 threads)

SELECT COUNT(DISTINCT ip) FROM tbl;

25

3/4: Columnar Data Access

10Gbps Network

Read ONLY the Required Part of Data

SELECT COUNT(DISTINCT ip) FROM tbl;

26

4/4: Object-based Storage

27

Enjoy Data-Driven Development!

28

Big Data for the Rest of Us

www.treasure-data.com | @TreasureData

32

Great Investors Bill Tai Naren Gupta –Nexus Ventures, Director of Red Hat, TIBCO Dave Stamm – Clarify, Daisy Systems, Enkata Othman Laraki –Twitter James Lindembaum, Adam Wiggins and Orion Henry – Heroku Anand Babu Periasamy and Hitesh Chellani –Gluster Yukihiro “Matz” Matsumoto – Creator of Ruby, now at Heroku Dan Schienman – Former Cisco SVP Jean-Philippe Emelie Marcos – Tango, D.E. Shaw + executives from Cisco, Red Hat, Salesforce.com, GREE

33

What are your options? Traditional

Too much complexity Too long to get live Too expensive to maintain Can only innovate at speed of

vendor

OnPremise Hadoop• Never design for analytic

processing• Too many people• Too much software from too

many sources

Cloud Hadoop• Partial solution• Vendor lock-in

34Confidential

35

Example Use Case – MySQL to TD

36

Example Use Case – MySQL to TD