IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

191
Crunch Big Data in the Cloud with IBM BigInsights and Hadoop IBD-3475 Leons Petrazickis, IBM Canada @leonsp © 2013 IBM Corporation

description

 

Transcript of IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

Page 1: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

Crunch Big Data in the Cloud with IBM BigInsights and Hadoop IBD-3475

Leons Petrazickis, IBM Canada

@leonsp

© 2013 IBM Corporation

Page 2: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

Please note

IBM’s statements regarding its plans, directions, and intent are subject to change or withdrawal without notice at IBM’s sole discretion.

Information regarding potential future products is intended to outline our general product direction and it should not be relied on in making a purchasing decision.

The information mentioned regarding potential future products is not a commitment, promise, or legal obligation to deliver any material, code or functionality. Information about potential future products may not be incorporated into any contract. The development, release, and timing of any future features or functionality described for our products remains at our sole discretion.

Performance is based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput or performance that any user will experience will vary depending upon many factors, including considerations such as the amount of multiprogramming in the user’s job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve results similar to those stated here.

Page 3: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

First step

Request a lab environment

http://bit.ly/requestLab

Page 4: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

BigDataUniversity.com

Page 5: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

Hadoop Architecture

Page 6: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

Agenda

• Terminology review

• Hadoop architecture

– HDFS

– Blocks

– MapReduce

– Type of nodes

– Topology awareness

– Writing a file to HDFS

6

Page 7: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

7

Hadoop cluster

Rack 1

Node 2

Node n

Terminology review

Node 1

Node 2

Node n

Rack 2

Node 1

Node 2

Node n

Rack n

Node 1

Page 8: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

Hadoop architecture

• Two main components:

– Hadoop Distributed File System (HDFS)

8

– MapReduce Engine

Page 9: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

Hadoop distributed file system (HDFS)

9

• Hadoop file system that runs on top of existing file system

• Designed to handle very large files with streaming data access patterns

• Uses blocks to store a file or parts of a file

Page 10: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

HDFS - Blocks

10

• File Blocks

– 64MB (default), 128MB (recommended) – compare to 4KB in UNIX

– Behind the scenes, 1 HDFS block is supported by multiple operating system (OS) blocks

• Advantages of blocks:

– Fixed size – easy to calculate how many fit on a disk

– A file can be larger than any single disk in the network

– If a file or a chunk of the file is smaller than the block size, only needed space is used. Eg: 420MB file is split as:

• Fits well with replication to provide fault tolerance and availability

128MB 128MB 36MB 128MB

128 MB

OS Blocks

HDFS Block

Page 11: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

HDFS - Replication

• Blocks with data are replicated to multiple nodes

• Allows for node failure without data loss

11

Node 1

Node 2

Node 3

Page 12: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

MapReduce engine

12

• Technology from Google

• A MapReduce program consists of map and reduce

functions

• A MapReduce job is broken into tasks that run in

parallel

Page 13: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

Types of nodes - Overview

13

• HDFS nodes

– NameNode

– DataNode

• MapReduce nodes

– JobTracker

– TaskTracker

• There are other nodes not discussed in this course

Page 14: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

Types of nodes - Overview

14

Page 15: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

Types of nodes - NameNode

15

• NameNode

– Only one per Hadoop cluster

– Manages the filesystem namespace and metadata

– Single point of failure, but mitigated by writing state to

multiple filesystems

– Single point of failure: Don’t use inexpensive

commodity hardware for this node, large memory

requirements

Page 16: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

Types of nodes - DataNode

16

• DataNode

– Many per Hadoop cluster

– Manages blocks with data and

serves them to clients

– Periodically reports to name

node the list of blocks it stores

– Use inexpensive commodity

hardware for this node

Page 17: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

Types of nodes - JobTracker

17

• JobTracker node

– One per Hadoop cluster

– Receives job requests submitted by client

– Schedules and monitors MapReduce jobs on task

trackers

Page 18: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

Types of nodes - TaskTracker

18

• TaskTracker node

– Many per Hadoop cluster

– Executes MapReduce operations

– Reads blocks from DataNodes

Page 19: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

19

…lesson continued in the next video>

Page 20: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

Topology awareness

20

Bandwidth becomes progressively smaller in the following scenarios:

Page 21: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

Topology awareness

21

Bandwidth becomes progressively smaller in the following scenarios:

1. Process on the same node.

Page 22: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

Bandwidth becomes progressively smaller in the following scenarios:

1. Process on the same node

2. Different nodes on the same rack

Topology awareness

22

Page 23: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

Bandwidth becomes progressively smaller in the following scenarios:

1. Process on the same node

2. Different nodes on the same rack

3. Nodes on different racks in the same data center

Topology awareness

23

Page 24: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

Bandwidth becomes progressively smaller in the following scenarios:

1. Process on the same node

2. Different nodes on the same rack

3. Nodes on different racks in the same data center

4. Nodes in different data centers

Topology awareness

24

Page 25: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

Writing a file to HDFS

25

Page 26: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

Writing a file to HDFS

26

Page 27: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

Writing a file to HDFS

27

Page 28: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

Writing a file to HDFS

28

Page 29: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

Writing a file to HDFS

29

Page 30: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

Writing a file to HDFS

30

Page 31: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

Writing a file to HDFS

31

Page 32: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

Writing a file to HDFS

32

Page 33: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

Writing a file to HDFS

33

Page 34: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

Writing a file to HDFS

34

Page 35: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

Writing a file to HDFS

35

Page 36: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

Thank You

Page 37: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

What is Hadoop?

Page 38: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

Agenda

38

• What is Hadoop?

• What is Big Data?

• Hadoop-related open source projects

• Examples of Hadoop in action

• Big Data solutions and the Cloud

Page 39: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

What is Hadoop?

39

Relational Database

1GB

Page 40: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

What is Hadoop?

40

Relational Database

1GB

10GB

Page 41: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

What is Hadoop?

41

Relational Database

1GB

10GB

100GB

Page 42: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

What is Hadoop?

42

Relational Database

1GB

10GB

100GB

Page 43: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

What is Hadoop?

43

Relational Database

1TB

Page 44: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

What is Hadoop?

44

Relational Database

1TB

10TB 100TB

Page 45: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

What is Hadoop?

45

Relational Database

1TB

10TB 100TB

Page 46: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

What is Hadoop?

46

Relational Database

1TB

10TB 100TB

RFIDs

Sensors

Facebook

Twitter

Page 47: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

What is Hadoop?

47

• Written in Java

• Using inexpensive commodity hardware

• A variety of data (structured, unstructured, semi-structured)

• Massive amounts of data through parallelism

• Optimized to handle

• Not for OLTP, not for OLAP/DSS, good for Big Data

• Open source project

• Reliability provided through replication

• Current version: 0.20.2

• Great performance

Page 48: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

What is Big Data?

48

RFID Readers

Page 49: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

What is Big Data?

49

2 Billion internet users

Page 50: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

What is Big Data?

50

4.6 Billion mobile phones

Page 51: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

What is Big Data?

51

7TB of data processed by Twitter every day

7TB

a day

Page 52: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

What is Big Data?

52

10TB of data processed by Facebook every day

10TB

a day

Page 53: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

What is Big Data?

53

About 80% of this data is unstructured

Page 55: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

Examples of Hadoop in action – IBM Watson

55

Page 56: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

Examples of Hadoop in action

56

• In the telecommunication industry

• In the media

• In the technology industry

Page 57: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

Hadoop is not for all types of work

57

• Not to process transactions (random access)

• Not good when work cannot be parallelized

• Not good for low latency data access

• Not good for processing lots of small files

• Not good for intensive calculations with little data

Page 58: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

Big Data solutions and the Cloud

58

• Big Data solutions are more than just Hadoop

– Add business intelligence/analytics functionality

– Derive information of data in motion

• Big Data solutions and the Cloud are a perfect fit.

– The Cloud allows you to set up a cluster of systems in minutes and it’s relatively inexpensive.

Page 59: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

Thank You

Page 60: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

HDFS – Command Line

Page 61: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

Agenda

• HDFS Command Line Interface

• Examples

61

Page 62: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

HDFS Command line interface

62

• File System Shell (fs)

• Invoked as follows:

hadoop fs <args>

• Example:

Listing the current directory in hdfs

hadoop fs –ls .

Page 63: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

HDFS Command line interface

63

• FS shell commands take paths URIs as argument

• URI format:

scheme://authority/path

• Scheme:

• For the local filesystem, the scheme is file

• For HDFS, the scheme is hdfs

hadoop fs –copyFromLocal file://myfile.txt hdfs://localhost/user/keith/myfile.txt

• Scheme and authority are optional

• Defaults are taken from configuration file core-site.xml

Page 64: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

HDFS Command line interface

64

• Many POSIX-like commands

• cat, chgrp, chmod, chown, cp, du, ls, mkdir, mv, rm, stat, tail

• Some HDFS-specific commands

• copyFromLocal, copyToLocal, get, getmerge, put, setrep

Page 65: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

HDFS – Specific commands

65

• copyFromLocal / put

• Copy files from the local file system into fs

hadoop fs -copyFromLocal <localsrc> .. <dst>

hadoop fs -put <localsrc> .. <dst>

Or

Page 66: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

HDFS – Specific commands

66

• copyToLocal / get

• Copy files from fs into the local file system

hadoop fs -copyToLocal [-ignorecrc] [-crc] <src> <localdst>

hadoop fs -get [-ignorecrc] [-crc] <src> <localdst>

Or

Page 67: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

HDFS – Specific commands

67

• getMerge

• Get all the files in the directories that match the source file pattern

• Merge and sort them to only one file on local fs

• <src> is kept

hadoop fs -getmerge <src> <localdst>

Page 68: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

HDFS – Specific commands

68

• setRep

• Set the replication level of a file.

• The -R flag requests a recursive change of replication level for an entire tree.

• If -w is specified, waits until new replication level is achieved.

hadoop fs -setrep [-R] [-w] <rep> <path/file>

Page 69: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

Thank You

Page 70: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

Hadoop MapReduce

Page 71: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

Agenda

71

• Map operations

• Reduce operations

• Submitting a MapReduce job

• Distributed Mergesort Engine

• Two fundamental data types

• Fault tolerance

• Scheduling

• Task execution

Page 72: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

What is a Map operation?

72

• Doing something to every element in an array is a common operation:

var a = [1,2,3];

for (i = 0; i < a.length; i++)

a[i] = a[i] * 2;

Page 73: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

What is a Map operation?

73

• Doing something to every element in an array is a common operation:

var a = [1,2,3];

for (i = 0; i < a.length; i++)

• New value for variable a would be:

var a = [2,4,6];

a[i] = a[i] * 2;

Page 74: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

What is a Map operation?

74

• Doing something to every element in an array is a common operation:

var a = [1,2,3];

for (i = 0; i < a.length; i++)

• New value for variable a would be:

var a = [2,4,6];

This can

be written as

a function

a[i] = a[i] * 2;

Page 75: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

What is a Map operation?

75

• Doing something to every element in an array is a common operation:

var a = [1,2,3];

for (i = 0; i < a.length; i++)

• New value for variable a would be:

var a = [2,4,6];

a[i] = a[i] * 2; a[i] = fn(a[i]);

Like this,

where fn

is

a function

defined

as:

function

fn(x)

{return

x*2;}

Page 76: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

What is a Map operation?

76

• Doing something to every element in an array is a common operation:

var a = [1,2,3];

for (i = 0; i < a.length; i++)

a[i] = fn(a[i]);

Now, all of this can also be

converted into a “map” function

Page 77: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

What is a Map operation?

77

• …like this, where fn is a function passed as an argument:

function map(fn, a) {

for (i = 0; i < a.length; i++)

a[i] = fn(a[i]);

}

Page 78: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

What is a Map operation?

78

• …like this, where fn is a function passed as an argument:

function map(fn, a) {

for (i = 0; i < a.length; i++)

a[i] = fn(a[i]);

}

• You can invoke this map function like this:

map(function(x){return x*2;}, a);

Page 79: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

What is a Map operation?

79

• …like this, where fn is a function passed as an argument:

function map(fn, a) {

for (i = 0; i < a.length; i++)

a[i] = fn(a[i]);

}

• You can invoke this map function like this:

map(function(x){return x*2;}, a);

This is function fn whose definition is included in the call

Page 80: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

What is a Map operation?

80

for (i = 0; i < a.length; i++)

a[i] = a[i] * 2;

}

• In summary, now you can rewrite:

as a map operation:

map(function(x){return x*2;}, a);

Page 81: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

What is a Reduce operation?

81

• Another common operation on arrays is to combine all their values:

function sum(a) {

var s = 0;

for (i = 0; i < a.length; i++)

s += a[i];

return s;

}

Page 82: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

What is a Reduce operation?

82

• Another common operation on arrays is to combine all their values:

function sum(a) {

var s = 0;

for (i = 0; i < a.length; i++)

s += a[i];

return s;

}

This can

be written

as a

function

Page 83: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

What is a Reduce operation?

83

• Another common operation on arrays is to combine all their values:

function sum(a) {

var s = 0;

for (i = 0; i < a.length; i++)

s = fn(s,a[i]);

return s;

}

Like this, where function fn is defined so it adds its arguments: function fn(a,b){ return a+b; }

Page 84: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

What is a Reduce operation?

84

• Another common operation on arrays is to combine all their values:

function sum(a) {

var s = 0;

for (i = 0; i < a.length; i++)

s = fn(s, a[i]);

return s;

}

The whole function sum can also be rewritten so that fn is passed as an

argument

Page 85: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

What is a Reduce operation?

85

• Another common operation on arrays is to combine all their values:

function reduce(fn, a, init) {

var s = init;

for (i = 0; i < a.length; i++)

s = fn(s, a[i]);

return s;

}

Like this… The function name was changed to reduce, and now it takes

three arguments, a function, an array, and an initial value

Page 86: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

What is a Reduce operation?

86

• Another common operation on arrays is to combine all their values:

function sum(a) {

var s = 0;

for (i = 0; i < a.length; i++)

s += a[i];

return s;

}

as a reduce operation:

reduce(function(a,b){return a+b;},a,0);

Page 87: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

87

…lesson continued in the next video>

Page 88: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

Submitting a MapReduce job

88

Page 89: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

Submitting a MapReduce job

89

Page 90: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

Submitting a MapReduce job

90

Page 91: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

Submitting a MapReduce job

91

Page 92: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

Submitting a MapReduce job

92

Page 93: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

Submitting a MapReduce job

93

Page 94: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

Submitting a MapReduce job

94

Page 95: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

Submitting a MapReduce job

95

Page 96: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

Submitting a MapReduce job

96

Page 97: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

Submitting a MapReduce job

97

Page 98: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

98

…lesson continued in the next video>

Page 99: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

MapReduce – Distributed Mergesort Engine

99

Page 100: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

MapReduce – Distributed Mergesort Engine

100

Page 101: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

MapReduce – Distributed Mergesort Engine

101

Page 102: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

MapReduce – Distributed Mergesort Engine

102

Page 103: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

MapReduce – Distributed Mergesort Engine

103

Page 104: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

MapReduce – Distributed Mergesort Engine

104

Page 105: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

MapReduce – Distributed Mergesort Engine

105

Page 106: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

MapReduce – Distributed Mergesort Engine

106

Page 107: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

MapReduce – Distributed Mergesort Engine

107

Page 108: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

MapReduce – Distributed Mergesort Engine

108

Page 109: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

MapReduce – Distributed Mergesort Engine

109

Page 110: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

110

…lesson continued in the next video>

Page 111: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

Two Fundamental data types

111

Input Output

map

reduce

• Key/value pairs

• Lists

Page 112: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

Two Fundamental data types

112

Input Output

map <k1, v1>

reduce

• Key/value pairs

• Lists

Page 113: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

Two Fundamental data types

113

Input Output

map <k1, v1> list(<k2, v2>)

reduce

• Key/value pairs

• Lists

Page 114: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

Two Fundamental data types

114

Input Output

map <k1, v1> list(<k2, v2>)

reduce <k2, list(v2)>

• Key/value pairs

• Lists

Page 115: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

Two Fundamental data types

115

Input Output

map <k1, v1> list(<k2, v2>)

reduce <k2, list(v2)> list(<k3, v3>)

• Key/value pairs

• Lists

Page 116: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

Simple data flow example

116

Page 117: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

Simple data flow example

117

Page 118: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

Simple data flow example

118

Page 119: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

Simple data flow example

119

Page 120: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

Simple data flow example

120

Page 121: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

121

…lesson continued in the next video>

Page 122: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

Fault tolerance

122

Page 123: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

Fault tolerance

123

• Task Failure

Page 124: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

Fault tolerance

124

• Task Failure

• If a child task fails, the child JVM reports to the TaskTracker before it exits. Attempt is marked failed, freeing up slot for another task.

Page 125: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

Fault tolerance

125

• Task Failure

• If a child task fails, the child JVM reports to the TaskTracker before it exits. Attempt is marked failed, freeing up slot for another task.

• If the child task hangs, it is killed. JobTracker reschedules the task on another machine.

Page 126: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

Fault tolerance

126

• Task Failure

• If a child task fails, the child JVM reports to the TaskTracker before it exits. Attempt is marked failed, freeing up slot for another task.

• If the child task hangs, it is killed. JobTracker reschedules the task on another machine.

• If task continues to fail, job is failed.

Page 127: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

Fault tolerance

127

• TaskTracker Failure

Page 128: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

Fault tolerance

128

• TaskTracker Failure

• JobTracker receives no heartbeat

Page 129: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

Fault tolerance

129

• TaskTracker Failure

• JobTracker receives no heartbeat

• Removes TaskTracker from pool of TaskTrackers to schedule tasks on.

Page 130: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

Fault tolerance

130

• TaskTracker Failure

• JobTracker receives no heartbeat

• Removes TaskTracker from pool of TaskTrackers to schedule tasks on.

• JobTracker Failure

Page 131: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

Fault tolerance

131

• TaskTracker Failure

• JobTracker receives no heartbeat

• Removes TaskTracker from pool of TaskTrackers to schedule tasks on.

• JobTracker Failure

• Singe point of failure. Job fails

Page 132: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

132

…lesson continued in the next video>

Page 133: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

Scheduling

133

Page 134: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

Scheduling

134

• FIFO scheduler (with priorities)

Page 135: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

Scheduling

135

• FIFO scheduler (with priorities)

• Each job uses the whole cluster, so jobs wait their turn.

Page 136: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

Scheduling

136

• FIFO scheduler (with priorities)

• Each job uses the whole cluster, so jobs wait their turn.

• Fair scheduler

Page 137: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

Scheduling

137

• FIFO scheduler (with priorities)

• Each job uses the whole cluster, so jobs wait their turn.

• Fair scheduler

• Jobs placed in pools. If a user submits more jobs than another user, he will not get any more cluster resources than the other user, on average. Can define custom pools with guaranteed minimum capacity.

Page 138: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

Scheduling

138

• FIFO scheduler (with priorities)

• Each job uses the whole cluster, so jobs wait their turn.

• Fair scheduler

• Jobs placed in pools. If a user submits more jobs than another user, he will not get any more cluster resources than the other user, on average. Can define custom pools with guaranteed minimum capacity.

• Capacity scheduler

Page 139: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

Scheduling

139

• FIFO scheduler (with priorities)

• Each job uses the whole cluster, so jobs wait their turn.

• Fair scheduler

• Jobs placed in pools. If a user submits more jobs than another user, he will not get any more cluster resources than the other user, on average. Can define custom pools with guaranteed minimum capacity.

• Capacity scheduler

• Allows Hadoop to simulate, for each user, a separate MapReduce cluster with FIFO scheduling.

Page 140: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

Task execution

140

Page 141: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

Task execution

141

• Speculative Execution

Page 142: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

Task execution

142

• Speculative Execution

• Job execution is time sensitive to slow-running tasks. Hadoop detects slow-running tasks and launches another, equivalent task as a backup. The output from the first of these tasks to finish is used.

Page 143: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

Task execution

143

• Speculative Execution

• Job execution is time sensitive to slow-running tasks. Hadoop detects slow-running tasks and launches another, equivalent task as a backup. The output from the first of these tasks to finish is used.

• Task JVM Reuse

Page 144: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

Task execution

144

• Speculative Execution

• Job execution is time sensitive to slow-running tasks. Hadoop detects slow-running tasks and launches another, equivalent task as a backup. The output from the first of these tasks to finish is used.

• Task JVM Reuse

• Tasks run in their own JVMs for isolation. Jobs that have a large number of short-lived tasks or tasks with lengthy initialization can benefit from sequential JVM reuse through configuration.

Page 145: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

Thank You

Page 146: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

Pig, Hive, and JAQL

Page 147: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

Agenda

147

• Overview

• Pig

• Hive

• Jaql

Page 148: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

Agenda

148

• Overview

• Pig

• Hive

• Jaql

Page 149: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

Similarities of Pig, Hive and Jaql

149

All translate their respective high-level languages to MapReduce jobs

All offer significant reductions in program size over Java

All provide points of extension to cover gaps in functionality

All provide interoperability with other languages

None support random reads/writes or low-latency queries

Page 150: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

Comparing Pig, Hive, and Jaql

150

Pig Hive Jaql

Developed by Yahoo! Facebook IBM

Language name Pig Latin HiveQL Jaql

Type of language Data flow

Declarative

(SQL dialect) Data flow

Data structures it

operates on Complex

Geared

towards

structured data

Loosely structured

data, JSON

Schema optional? Yes

No, but data

can have many

schemas Yes

Turing complete?

Yes when

extended with

Java UDFs

Yes when

extended with

Java UDFs Yes

Page 151: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

Agenda

151

• Overview

• Pig

• Hive

• Jaql

Page 152: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

Pig components

• Two Components

Language (called Pig Latin)

Compiler

• Two execution environments

Local (Single JVM)

pig -x local

Distributed (Hadoop cluster)

pig -x mapreduce, or simply pig

152

Pig Latin

Compiler

Local

Distributed

Pig

Execution Environment

152

Page 153: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

Running Pig

Script

pig scriptfile.pig

Grunt (command line)

pig (to launch command line tool)

Embedded

Call in to Pig from Java

153 153

Page 154: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

Pig Latin sample code

154

#pig

grunt> records = LOAD ‘econ_assist.csv’

using PigStorage (‘,’)

AS (country:chararray, sum:long);

grunt> grouped = GROUP records BY country;

grunt> thesum = FOREACH grouped

GENERATE group,

SUM(records, sum);

grunt> DUMP thesum;

154

Page 155: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

Pig Latin – Statements, operations & commands

155

Pig Latin program

… LOAD ‘input.txt’;

… ls *.txt

… DUMP…

An operation

as a statement A

command

as a

statement

Logical Plan

Compile Physical

Plan

Execute

155

Page 156: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

Pig Latin statements

UDF Statements

REGISTER, DEFINE

Commands

Hadoop Filesystem (cat, ls, etc.)

Hadoop MapReduce (kill)

Utility (exec, help, quit, run, set)

Operators

Diagnostic: DESCRIBE, EXPLAIN, ILLUSTRATE

Relational: LOAD, STORE, DUMP, FILTER, etc.

156 156

Page 157: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

Pig Latin – Relational operators

Loading and storing

Eg: LOAD (into a program), STORE (to disk), DUMP (to the screen)

Filtering Eg: FILTER, DISTINCT, FOREACH...GENERATE, STREAM, SAMPLE

Grouping and joining Eg: JOIN, COGROUP, GROUP, CROSS

Sorting Eg: ORDER, LIMIT

Combining and splitting Eg: UNION, SPLIT

157 157

Page 158: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

Pig Latin – Relations and schema

Result of a relational operator is a relation

A relation is a set of tuples

Relations can be named using an alias (Eg: “x”)

158

x = LOAD ‘sample.txt’ AS (id: int, year:int);

DUMP x

Output is a tuple. Eg: (1,1987)

158

Page 159: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

Pig Latin – Relations and schema

Structure of a relation is a schema

Use the DESCRIBE operator to see the schema. Eg:

The output is the schema:

159

DESCRIBE x

x: {id: int, year: int}

159

Page 160: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

Pig Latin expressions

Statements that contain relational operators may also contain expressions.

Kinds of expressions:

Constant Field Projection

Map lookup Cast Arithmetic

Conditional Boolean Comparison

Functional Flatten

160 160

Page 161: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

Pig Latin – Data types

• Simple types:

int float bytearray

long double chararray

Complex types:

Tuple – Sequence of fields of any type

Bag – Unordered collection of tuples

Map – Set of key-value pairs. Keys must be chararray.

161 161

Page 162: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

Pig Latin – Function types

Eval

Input: One or more expressions

Output: An expression

Example: MAX

Filter

Input: Bag or map

Output: boolean

Example: IsEmpty

162 162

Page 163: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

Load

Input: Data from external storage

Output: A relation

Example: PigStorage

Store

Input: A relation

Output: Data to external storage

Example: PigStorage

163

Pig Latin – Function types

163

Page 164: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

Pig Latin – User-Defined Functions

• Written in Java

Packaged in a JAR file

Register JAR file using the REGISTER statement

Optionally, alias it with DEFINE statement

164 164

Page 165: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

Agenda

165

• Overview

• Pig

• Hive

• Jaql

Page 166: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

Hive architecture

166

Metastore

(Relational

database

for metadata)

Hadoop

JDBC/ODBC

CLI

Web

Interface

Parser,

Planner

Optimizer

DDL Queries

166

Page 167: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

Running Hive

Hive Shell

Interactive

hive

Script

hive -f myscript

Inline

hive -e 'SELECT * FROM mytable'

167 167

Page 168: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

Hive services

hive --service servicename

where servicename can be:

hiveserver

server for Thrift, JDBC, ODBC clients

hwi

web interface

jar

hadoop jar with Hive jars in classpath

metastore

out of process metastore

168 168

Page 169: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

Hive - Metastore

Stores Hive metadata

Configurations

Embedded

in-process metastore, in-process database

Local

in-process metastore, out-of-process database

Remote

out-of-process metastore, out-of-process database

169 169

Page 170: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

Hive – Schema-On-Read

Faster loads into the database (simply copy or move)

Slower queries

Flexibility – multiple schemas for the same data

170 170

Page 171: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

Hive - Configuration

• Three ways to configure hive:

• hive-site.xml

- fs.default.name

- mapred.job.tracker

- Metastore configuration settings

hive –hiveconf

“Set” command in the Hive Shell

171 171

Page 172: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

Hive Query Language (HiveQL)

SQL dialect

Does not support full SQL92 specification

No support for:

HAVING clause in SELECT

Correlated subqueries

Subqueries outside FROM clauses

Updateable or materialized views

Stored procedures

172 172

Page 173: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

Sample code

173

#hive

hive> CREATE TABLE foreign_aid

(country STRING, sum BIGINT)

ROW FORMAT DELIMITED

FIELDS TERMINATED BY ‘,’

STORED AS TEXTFILE;

hive> SHOW TABLES;

hive> DESCRIBE foreign_aid;

hive> LOAD DATA INPATH ‘econ_assist.csv’

OVERWRITE INTO TABLE foreign_aid;

hive> SELECT * FROM foreign_aid LIMIT 10;

hive> SELECT country, SUM(sum) FROM foreign_aid

GROUP BY country;

173

Page 174: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

Hive Query Language (HiveQL)

Extensions

MySQL-like extensions

MapReduce extensions

Multi-table insert, MAP, REDUCE, TRANSFORM clauses

Data Types

Simple

TINYINT, SMALLINT, INT, BIGINT, FLOAT, DOUBLE, BOOLEAN, STRING

Complex

ARRAY, MAP, STRUCT

174 174

Page 175: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

Hive Query Language (HiveQL)

Built-in Functions SHOW FUNCTIONS

DESCRIBE FUNCTION

175 175

Page 176: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

Hive – User-Defined Functions

Written in Java

Three UDF types:

UDF

Input: single row, output: single row

UDAF

Input: multiple rows, output: single row

UDTF

Input: single row, output: multiple rows

Register UDF using ADD JAR

Create alias using CREATE TEMPORARY FUNCTION

176 176

Page 177: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

Agenda

177

• Overview

• Pig

• Hive

• Jaql

Page 178: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

Jaql architecture

178

Interactive shell / Applications

Script

Compiler / Parser / Rewriter

File Systems

(HDFS, GPFS, Local)

Databases

(DBMS, HBase)

Streams

(Web, Pipes)

Storage layer

I/O layer

178

Page 179: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

Jaql data model: JSON

JSON = JavaScript object Notation

Flexible (Schema is optional)

Powerful modeling for semi-structured data

Popular exchange format

179 179

Page 180: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

JSON example

180

[

{ACCT_NUM:18,AUTH_DATE:”2011-01-29”,

AUTH_AMT:”111.11”,ZIP:98765,MERCH_NAME:”Acme”},

{ACCT_NUM:19,AUTH_DATE:”2011-01-29”,

AUTH_AMT:”222.22”,ZIP:98765,MERCH_NAME:”Exxme”,

NICKNAME:”Xyz”},

{ACCT_NUM:20,AUTH_DATE:”2011-01-30”,

AUTH_AMT:”3.33”,ZIP:12345,MERCH_NAME:”Acme”,

ROUTE:[”68.86.85.188”,”64.215.26.111”]},

… ]

180

Page 181: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

Running Jaql

Jaql Shell

Interactive. Eg: jaqlshell

Batch Eg: jaqlshell -b myscript.jaql

Inline Eg: jaqlshell -e jaqlstatement

Modes

Cluster Eg: jaqlshell -c

Minicluster Eg: jaqlshell

181 181

Page 182: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

Jaql query language

• Sources and sinks

Eg: Copy data from a local file to a new file on HDFS

source sink

read(file(“input.json”)) -> write(hdfs(“output”))

Core Operators

Filter Group Tee

Transform Join Sort

Expand Union Top

182

source sink operator operator …

182

Page 183: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

Jaql query language

• Variables

Equal operator (=) binds source output to a variable

e.g. $tweets = read(hdfs(“twitterfeed”))

Pipes, streams, and consumers

Pipe operator (->) streams data to a consumer

Pipe expects array as input

e.g. $tweets → filter $.from_src == 'tweetdeck';

$ – implicit variable referencing current array value

183 183

Page 184: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

Jaql query language

• Categories of Built-in Functions

system schema agg

core xml number

hadoop regex string

io binary function

array date random

index nil record

184 184

Page 185: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

Jaql – Data Storage

Data store examples Amazon S3 DB2 HBase HDFS

HTTP JDBC Local FS

Data format examples JSON AVRO CSV XML

185 185

Page 186: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

Jaql sample code

186

#jaqlshell -c

jaql> $foreignaid =

read(del(“econ_assist.csv”,

{schema: schema

{country: string, sum: long}

} )

)

jaql> $foreignaid

-> group by $country = ($.country)

into {$country.country, sum($[*].sum)};

186

Page 187: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

Hadoop core lab – Part 3

Page 188: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

BigDataUniversity.com

Page 189: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

Acknowledgements and Disclaimers

Availability. References in this presentation to IBM products, programs, or services do not imply that they will be available in all countries in

which IBM operates.

The workshops, sessions and materials have been prepared by IBM or the session speakers and reflect their own views. They are provided for

informational purposes only, and are neither intended to, nor shall have the effect of being, legal or other guidance or advice to any participant.

While efforts were made to verify the completeness and accuracy of the information contained in this presentation, it is provided AS-IS without

warranty of any kind, express or implied. IBM shall not be responsible for any damages arising out of the use of, or otherwise related to, this

presentation or any other materials. Nothing contained in this presentation is intended to, nor shall have the effect of, creating any warranties or

representations from IBM or its suppliers or licensors, or altering the terms and conditions of the applicable license agreement governing the use

of IBM software.

All customer examples described are presented as illustrations of how those customers have used IBM products and the results they may have

achieved. Actual environmental costs and performance characteristics may vary by customer. Nothing contained in these materials is intended

to, nor shall have the effect of, stating or implying that any activities undertaken by you will result in any specific sales, revenue growth or other

results.

© Copyright IBM Corporation 2013. All rights reserved.

•U.S. Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM

Corp.

IBM, the IBM logo, ibm.com, InfoSphere and BigInsights, Streams, and DB2 are trademarks or registered trademarks of International

Business Machines Corporation in the United States, other countries, or both. If these and other IBM trademarked terms are marked on

their first occurrence in this information with a trademark symbol (® or ™), these symbols indicate U.S. registered or common law

trademarks owned by IBM at the time this information was published. Such trademarks may also be registered or common law

trademarks in other countries. A current list of IBM trademarks is available on the Web at “Copyright and trademark information” at

www.ibm.com/legal/copytrade.shtml

Other company, product, or service names may be trademarks or service marks of others.

Page 190: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

Communities

• On-line communities, User Groups, Technical Forums, Blogs, Social networks, and more

o Find the community that interests you …

• Information Management bit.ly/InfoMgmtCommunity

• Business Analytics bit.ly/AnalyticsCommunity

• Enterprise Content Management bit.ly/ECMCommunity

• IBM Champions

o Recognizing individuals who have made the most outstanding contributions to Information Management, Business Analytics, and Enterprise Content Management communities

• ibm.com/champion

Page 191: IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

Thank You Your feedback is important!

• Access the Conference Agenda Builder to complete your session surveys

oAny web or mobile browser at http://iod13surveys.com/surveys.html

oAny Agenda Builder kiosk onsite