Horizon 20110928

NEARING THE EVENT HORIZON.HADOOP WAS PREDICTABLE, WHAT’S NEXT?

Mike Miller (UW)_mlmilleratmit

September 28, 2011

Mike Miller

What I Am

Assistant Professor, Particle Physics(UW)

Cloudant Founder, Chief Scientist

Background: machine learning, analysis, big data, globally distributed systems

Mike Miller

What I Am Not

didn’t see these comingSuper luminal neutrinosRed Sox blow 9 game lead in SeptemberAmazon Silk...

But here I go anyway

Mike Miller

My First Postulate of Big-Data

What matters for google...... matters for the internet......and therefore matters for the enterprise...... will therefore be re-architected by Apache...... and therefore matters to you.

Google Matters

Mike Miller

Evidence

Business Week, 12/24/2007

Mike Miller

Evidence

Mike Miller

Evidence

Mike Miller

The Old Canon

• Google File System (the important one)http://labs.google.com/papers/gfs.html

• MapReduce (the big one)http://labs.google.com/papers/mapreduce.html

• BigTable (clone me!)http://labs.google.com/papers/bigtable.html

• Dynamo (ok, AWS. but masterless quorum) http://s3.amazonaws.com/AllThingsDistributed/sosp/amazon-dynamo-sosp2007.pdf

copy these. use these. print $$$

Mike Miller

So... is that it?

http://gigaom.com/cloud/democratizing-big-data-is-hadoop-our-only-hope/

Mike Miller

What’s Painful about MapReduce?

• Processing latencyNon-incremental, must re-slurp entire dataset every pass

• Ad-Hoc queriesBare metal interface, data import

• GraphsOnly a handful of graph problems amenable to MRhttp://www.computer.org/portal/web/csdl/doi/10.1109/MCSE.2009.120

Mike Miller

Enter The New Canon• Percolator

incremental processinghttp://research.google.com/pubs/pub36726.html

• Dremelad-hoc analysis querieshttp://research.google.com/pubs/pub36632.html

• PregelBig graphshttp://dl.acm.org/citation.cfm?id=1807184

Scalable, Fault Tolerant, Approachable

Mike Miller

Percolator: incremental processing• Replaced MapReduce as the tool to build search index

“However, reprocessing the entire web discards the work done in earlier runs and makes latency proportional to the size of the repository, rather than the size of the update.”

• Bigtable alone can’t do it“BigTable scales...but doesn’t provide tools to help programmers maintain data invariants in the face of concurrent updates.”

• ApplicabilityIncrementally updating dataComputational output can be broken down into small piecesComputation large in some dimension (data size, cpu, etc)

• Does it matter?“...Converting the indexing system to an incremental system ... reduced the averaging document processing latency by a factor of 100...”

Mike Miller

Percolator: incremental processing

• BigTable plus...

Transactionssnapshot isolation, locks

Timestamps

Notifications

Observersyour code to be run upon notification of an update

Mike Miller

Dremel: ad-hoc Query• Scalable, interactive ad-hoc query system for read-only nested

data“...capable of running aggregation queries over trillion-row tables in seconds.”

• ... on nested data structures in situWeb and scientific data is often non-relationalnested data (protobu!s) underlies most structured data at Google

• UsageDEFINE TABLE t AS /path/to/data/*SELECT TOP(signal1,100), COUNT(*) FROM t

• ApplicabilityAnalysis of crawled documentsTracking of install data for apps on Android MarketCrash reportsSpam analysis...

dream BI tool

Mike Miller

Dremel: ad-hoc Query• Ingredients

In situ dataSQL like interfaceServing trees for query executionColumn striped data

Mike Miller

Pregel: Big Graphs• Massively parallel processing of big graphs

billions of vertices, trillions of edges

• Bulk synchronous parallel modelsequence of vertex oriented iterationssend/receive messages from other vertex computationsread/modify state of vertex, outgoing edges, graph topology

• Expressive, easy to programdistribution details hidden behind abstract API

• Iterativecomputation continues until each vertex votes to terminate

• In productionPageRank 15 lines of code

14Nothing like this exists in open source

Mike Miller

Pregel: Big Graphs• Master “Name” node

connects processes for messaging

• Message Passingno remote procedures, reads

• Graph hashed across nodesvertex, outgoing edges stored in RAM

• Aggregators global mechanism for aggregationall but final reduce computed on node local data

• Checkpointing configurable, enables automatic recovery

Mike Miller

Lessons Learned

• Hire Je! Dean and Sanjay Ghemawat

• GFS enables everything

• There is massive opportunity on the horizon

Horizon 20110928

Technology

Transcript of Horizon 20110928

20110928 playmakers industries gamification

VMware Horizon View Architecture Planning - Horizon … · VMware Horizon View Architecture Planning Horizon View 5.2 ... VMware Horizon View Architecture Planning provides an ...

HORIZON Nxt® - InfuSystem. Braun Horizon NXT.pdf · HORIZON Nxt® Operation Manual ©1999 B.Braun Medical Inc. ... Chapter 8 MAINTENANCE ... Horizon or Horizon Nxt pump should be

20110928 wwjd shevlin_final

Installing Horizon Workspace - Horizon Workspace 1 - VMwarepubs.vmware.com/horizon-workspace-10/topic/com.vmware.ICbase/… · Installing Horizon Workspace Horizon Workspace 1.0 ...

Design Criteria - Electrical 20110928

Horizon Holdings Inc. 2010 - Horizon Utilities

VMware Horizon View Architecture Planning - Horizon …pubs.vmware.com/.../PDF/horizon-view-52-architecture-planning.pdf · VMware Horizon View Architecture Planning Horizon View

SCIENCE EXPLORER Earth Science...SCIENCE EXPLORER Earth Science ©Prentice-Hall, Inc. 26 Soil Composition and Soil Horizons A horizon A horizon B horizon C horizon C horizon C horizon

VMware Horizon FAQ experience, closed-loop manageability, and hybrid- cloud flexibility. VMware Horizon includes packages such as the Horizon 7 Editions, Horizon Apps, Horizon Cloud

Openstack Dashboard Wireframes 20110928

Analysis of the relationship between climate and NDVI ... · Title: NASA CC&E poster, Fanwei, 20110928.ppt Author: Zeng, Fanwei (GSFC-6144)[SCIENCE SYSTEMS APPLICATIONS] Created Date:

Horizon Workspace Administrator's Guide - Horizon Workspace 1

Integrating VMware Horizon Workspace and … VMware ® Horizon Workspace and VMware Horizon View ... Authentication Flow Sequence ... Integration of Horizon View with Horizon ...

AHRQ Healthcare Horizon Scanning System Horizon Scanning ...

'led - LADWPretirement.ladwp.com/AgendaItems/Agenda - 20110928 - ITEM 05A.pdf · wat ea a power employees' retirement plan and etiree hea n! benefrls fund summary of invest\lent rlluans

d13mk4zmvuctmz.cloudfront.net · A red coloured and clayey soil horizon is termed as l, O horizon 3. E horizon 2. A horizon 4. B horizon The wavelength/colourat which solar radiation

Global Privacy Policy - Horizon Pharma Homepage | Horizon Pharma · Horizon Pharma plc and its operating divisions, subsidiaries, affiliates and branches (collectively, “Horizon,”

Horizon Workspace Administrator's Guide - Horizon Workspace 1pubs.vmware.com/horizon-workspace-10/topic/com.vmware.ICbase/P… · Horizon Workspace Administrator's Guide Horizon Workspace

VMware Horizon FLEX Administration Guide - Horizon …pubs.vmware.com/.../PDF/horizon-flex-18-administration-guide.pdf · VMware Horizon FLEX Administration Guide Horizon FLEX 1.8