Hadoop: the Big Answer to the Big Question of the Big Data

Post on 21-Nov-2014

6.507 views 1 download

Tags:

description

More info: http://www.elekslabs.com/2012/02/devtalks-1-presentations.html Video: http://www.youtube.com/watch?feature=player_embedded&v=GENRle60Elk

Transcript of Hadoop: the Big Answer to the Big Question of the Big Data

BIGTHETO THEOF THE

ANSWERQUESTIONDATA

eleks DevTalks #1

by Victor Haydin

Gordon Moore

1975 2012Cost of 1 TB storage

$208 000 000 $110

Cost of 1 GFLOPS/s computing facility

$62 000 000 $1.50

Number of network hosts

57 > 1 000 000 000

World’s data amount

~130 GB ~2.9 ZB

1 ZB = 1 000 000 000 000 000 000 000 B(1021)

Commodity Hardware

Wikipedia: “Apache Hadoop is a software framework that supports data-intensive distributed applications”

Main Contributors

HDFS: Hadoop Distributed File System

Hardware Failure

Streaming Data Access

Large Data Sets

Simple Coherency Mode (write-once)

Portability

Moving Computation is cheaper then moving Data

MapReduce

Map(k1,v1) → list(k2,v2)

void map(string key, string value): for each word w in value: yield return KeyValuePair(w, 1);

Reduce(k2, list (v2)) → list(v3)

void reduce(string key, int[] values): int sum = 0; for each pc in values: sum += pc; return KeyValuePair(key, sum);

Demo

Ecosystem

ZooKeeper

3K+ nodes, 36+ PB

45K nodes, 180-200 PB

vspowered by

FutureCore:• HDFS: high-availability and scalability• MapReduce: modularity and alternative ways to perform queriesEcosystem development:• Apache BigTop: consolidation project• HBase, Hive, Pig, ZooKeeper, Avro, Sqoop: stabilizing, interoperability• Incubator: Flume, Ozzie, Whirr

Demo

Q&A