Cisco Hadoop Summit 2013

32
The Data Center and Hadoop Jacob Rapp, Cisco [email protected]

description

Cisco's presentations at Hadoop Summit 2013.

Transcript of Cisco Hadoop Summit 2013

Page 1: Cisco Hadoop Summit 2013

The Data Center and Hadoop

Jacob Rapp, Cisco

[email protected]

Page 2: Cisco Hadoop Summit 2013

Hadoop Considerations• Traffic Types, Job Patterns, Network Considerations, Compute

Network Integration• Co-exist with current Data Center infrastructure

• Open, Programmable and Application-Aware Networks

Multi-tenancy • Remove the “Silo clusters”

Agenda

2

Page 3: Cisco Hadoop Summit 2013

3

Hadoop Job Patterns and Network Traffic

Page 4: Cisco Hadoop Summit 2013

Job Patterns

4

Analyze

Extract Transform Load (ETL)

Explode

Reduce

Reduce

Reduce

Ingress vs. Egress

Data Set

1:0.3

Ingress vs. Egress

Data Set

1:1

Ingress vs. Egress

Data Set

1:2

The Time the reducers start is dependent on:

mapred.reduce.slowstart.completed.maps

It doesn’t change the amount of data sent to Reducers, but

may change the timing to send that data

Page 5: Cisco Hadoop Summit 2013

Traffic Types

5

Small Flows/Messaging(Admin Related, Heart-beats, Keep-alive,

delay sensitive application messaging)

Small – Medium Incast(Hadoop Shuffle)

Large Flows(HDFS Ingest)

Large Incast(Hadoop Replication)

Page 6: Cisco Hadoop Summit 2013

Map and Reduce Traffic

6

Many-to-Many Traffic Pattern

Map 1 Map 2 Map NMap 3

Reducer 1 Reducer 2 Reducer 3 Reducer N

HDFS

Shuffle

Output Replication

NameNode

JobTracker

ZooKeeper

Page 7: Cisco Hadoop Summit 2013

AnalyzeSimulated with Shakespeare Wordcount

Extract Transform Load (ETL)

Simulated with Yahoo TeraSort

Extract Transform Load (ETL)

Simulated with Yahoo TeraSort with output

replication

Job PatternsJob Patterns have varying impact on network utilization

Page 8: Cisco Hadoop Summit 2013

8

Integration into the Data Center

Page 9: Cisco Hadoop Summit 2013

9

Network Attributes Architecture Availability Capacity, Scale &

Oversubscription Flexibility Management & Visibility

Integration Considerations

Availa

blity

Bufferin

g

Overs

ubscrip

tion

Data

Node Spee

d

Laten

cy

Page 10: Cisco Hadoop Summit 2013

Data Node Speed Differences

10

Single 1GE100% Utilized

Dual 1GE75% Utilized

10GE40% Utilized

Generally 1G is being used largely due to the cost/performance trade-offs. Though 10GE can provide benefits depending on workload

Page 11: Cisco Hadoop Summit 2013

• No single point of failure from network view point. No impact on job completion time

• NIC bonding configured at Linux – with LACP mode of bonding

• Effective load-sharing of traffic flow on two NICs.

• Recommended to change the hashing to src-dst-ip-port (both network and NIC bonding in Linux) for optimal load-sharing

Availability Single Attached vs. Dual Attached Node

11

Page 12: Cisco Hadoop Summit 2013

1 13 25 37 49 61 73 85 97 109

121

133

145

157

169

181

193

205

217

229

241

253

265

277

289

301

313

325

337

349

361

373

385

397

409

421

433

445

457

469

481

493

505

517

529

541

553

565

577

589

601

613

625

637

649

661

673

685

697

709

721

733

745

757

769

781

793

Job

Com

pleti

on

Cell

Usa

ge

1G Buffer Used 10G Buffer Used 1G Map % 1G Reduce % 10G Map % 10G Reduce %

1GE vs. 10GE Buffer Usage

12

Moving from 1GE to 10GE actually lowers the buffer requirement at the switching layer.

By moving to 10GE, the data node has a wider pipe to receive data lessening the need for buffers on the network as the total aggregate transfer rate and amount of data does not increase substantially. This is due, in part, to limits of I/O and Compute capabilities

Page 13: Cisco Hadoop Summit 2013

Integration Considerations

Goals

• Extensive Validation of Hadoop Workload

• Reference ArchitectureMake it easy for Enterprise

Demystify Network for Hadoop Deployment

Integration with Enterprise with efficient choices of network topology/devices

Findings

• 10G and/or Dual attached server provides consistent job completion time & better buffer utilization

• 10G provide reduce burst at the access layer

• Dual Attached Sever is recommended design – 1G or 10G. 10G for future proofing

• Rack failure has the biggest impact on job completion time

• Does not require non-blocking network

• Latency does not matter much in Hadoop workloads

13http://www.slideshare.net/Hadoop_Summit/ref-arch-validated-and-tested-approach-to-define-a-network-designhttp://youtu.be/YJODsK0T67A

More Details From Hadoop Summit 2012 at:

Page 14: Cisco Hadoop Summit 2013

14

Network Integration

Page 15: Cisco Hadoop Summit 2013

© 2011 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 15

Which port is connected?

n3548-001# show interface brief

--------------------------------------------------------------------------------Ethernet VLAN Type Mode Status Reason Speed PortInterface Ch #--------------------------------------------------------------------------------Eth1/1 1 eth access up none 10G(D) --Eth1/2 1 eth access up none 10G(D) --Eth1/3 1 eth access up none 10G(D) --Eth1/4 1 eth access up none 10G(D) --Eth1/5 1 eth access up none 10G(D) –-..Eth1/33 1 eth access up none 10G(D) --Eth1/34 1 eth access up none 10G(D) --Eth1/35 1 eth access down SFP not inserted 10G(D) --Eth1/36 1 eth access down SFP not inserted 10G(D) --Eth1/37 1 eth access down Administratively down 10G(D) –.

Page 16: Cisco Hadoop Summit 2013

© 2011 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 16

What is connected there?Classic Network View

n3548-001# show mac address-table dynamic Legend: * - primary entry, G - Gateway MAC, (R) - Routed MAC, O - Overlay MAC age - seconds since first seen,+ - primary entry using vPC Peer-Link VLAN MAC Address Type age Secure NTFY Ports ---------+-----------------+--------+---------+------+----+------------------* 1 e8b7.484d.a208 dynamic 60570 F F Eth1/31* 1 e8b7.484d.a20a dynamic 60560 F F Eth1/31* 1 e8b7.484d.a73e dynamic 60560 F F Eth1/34* 1 e8b7.484d.a740 dynamic 60560 F F Eth1/34* 1 e8b7.484d.ad15 dynamic 60560 F F Eth1/28* 1 e8b7.484d.ad17 dynamic 60560 F F Eth1/28* 1 e8b7.484d.b3e9 dynamic 60570 F F Eth1/25* 1 e8b7.484d.b3eb dynamic 60560 F F Eth1/25..

MAC Addresses of the connected devices … and

the port they are on…

Page 17: Cisco Hadoop Summit 2013

© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 17

n3548-001# portServerMap=======================================Port Server FQDN---------------------------------------Eth1/1 c200-m2-10g2-001.cluster10g.comEth1/2 c200-m2-10g2-002.cluster10g.comEth1/3 c200-m2-10g2-003.cluster10g.comEth1/4 c200-m2-10g2-004.cluster10g.comEth1/5 c200-m2-10g2-005.cluster10g.comEth1/6 c200-m2-10g2-006.cluster10g.comEth1/7 c200-m2-10g2-031.cluster10g.comEth1/8 c200-m2-10g2-008.cluster10g.comEth1/9 c200-m2-10g2-009.cluster10g.comEth1/11 c200-m2-10g2-011.cluster10g.com...

What is actually connected there?

Which server is connected to which port on the switch …

Note: Eth1/10 is missing because there is nothing connected to it

Page 18: Cisco Hadoop Summit 2013

© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 18

n3548-001# trackerList===========================================Port Server Server Port-------------------------------------------Eth1/2 c200-m2-10g2-002 50544Eth1/3 c200-m2-10g2-003 41909Eth1/4 c200-m2-10g2-004 36480Eth1/5 c200-m2-10g2-005 38179Eth1/6 c200-m2-10g2-006 51375Eth1/7 c200-m2-10g2-031 41915Eth1/8 c200-m2-10g2-008 50983Eth1/9 c200-m2-10g2-009 37056Eth1/11 c200-m2-10g2-011 35882Eth1/12 c200-m2-10g2-012 44551...

What is running on those servers?

Hadoop - TaskTracker List

Note:Eth1/1 is not on the list because it’s the namenode and is not running a tasktracker Eth1/10 is not on the list because there is nothing connected to it

Page 19: Cisco Hadoop Summit 2013

© 2011 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 19

Which node is using the buffer?

n3548-001# bufferServerMap ===================================================================Port Server 1sec 5sec 60sec 5min 1hr-------------------------------------------------------------------Eth1/1 c200-m2-10g2-001 0KB 0KB 0KB 0KB 0KB Eth1/2 c200-m2-10g2-002 384KB 384KB 1536KB 2304KB 2304KB Eth1/3 c200-m2-10g2-003 384KB 384KB 1152KB 1536KB 1536KB Eth1/4 c200-m2-10g2-004 384KB 384KB 2304KB 2304KB 2304KB Eth1/5 c200-m2-10g2-005 384KB 384KB 768KB 1536KB 1536KB Eth1/6 c200-m2-10g2-006 384KB 2304KB 2304KB 2304KB 2304KB Eth1/7 c200-m2-10g2-031 384KB 384KB 3456KB 3840KB 3840KB Eth1/8 c200-m2-10g2-008 768KB 768KB 2688KB 2688KB 2688KB Eth1/9 c200-m2-10g2-009 384KB 384KB 2304KB 2304KB 2304KB Eth1/11 c200-m2-10g2-011 384KB 384KB 1920KB 1920KB 1920KB ...

Eth1/1(c200-m2-10g2-001) has 0 buffer usage because

it’s the name node

Page 20: Cisco Hadoop Summit 2013

© 2011 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 20

What’s running on this cluster + Buffer usage per server …

n3548-001# jobsBufferHadoop Job Info ... ===================================================================1 jobs currently runningJobId RunTime(secs) User Priorityjob_201306131423_0009 120 hadoop NORMAL ===================================================================Buffer Info - Per PortPort Server 1sec 5sec 60sec 5min 1hr-------------------------------------------------------------------Eth1/1 c200-m2-10g2-001 0KB 0KB 0KB 0KB 0KB Eth1/2 c200-m2-10g2-002 384KB 384KB 768KB 768KB 768KB Eth1/3 c200-m2-10g2-003 384KB 384KB 1152KB 1152KB 1152KB Eth1/4 c200-m2-10g2-004 384KB 1536KB 1536KB 1536KB 1536KB Eth1/5 c200-m2-10g2-005 384KB 768KB 1152KB 1152KB 1152KB ..

What jobs were running during peak buffer usage … and for how long were

they running

Page 21: Cisco Hadoop Summit 2013

© 2011 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 21

What’s running on this cluster + Buffer usage per server …

n3548-001(config)# jobsBufferHadoop Job Info ... ===================================================================0 jobs currently runningJobId RunTime(secs) User Priority===================================================================Buffer Info - Per PortPort Server 1sec 5sec 60sec 5min 1hr-------------------------------------------------------------------Eth1/1 c200-m2-10g2-001 0KB 0KB 0KB 0KB 0KB Eth1/2 c200-m2-10g2-002 0KB 0KB 0KB 1920KB 1920KB Eth1/3 c200-m2-10g2-003 0KB 0KB 0KB 2304KB 2304KB Eth1/4 c200-m2-10g2-004 0KB 0KB 0KB 2688KB 2688KB Eth1/5 c200-m2-10g2-005 0KB 0KB 0KB 2304KB 2304KBEth1/6 c200-m2-10g2-006 0KB 0KB 0KB 2304KB 2304KB Eth1/7 c200-m2-10g2-031 0KB 0KB 0KB 1920KB 2688KB .

Historic look at the buffer usage …

Page 22: Cisco Hadoop Summit 2013

© 2011 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 22

Server Resource Monitoring – CPU, Connections, etc.,

Page 23: Cisco Hadoop Summit 2013

© 2011 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 23

Network Resource Monitoring – Buffer Counters etc.,

Page 24: Cisco Hadoop Summit 2013

© 2011 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 24

Server + Network

Page 25: Cisco Hadoop Summit 2013

© 2011 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 25

Shuffle vs Replication + Buffer UsageTerasort on 10G

Buffer Usage

Shuffle

Replication

Reduce

Map

0 60 120 180 240 300 360 420 480 540 600 660 720 780

Page 26: Cisco Hadoop Summit 2013

© 2011 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 26

Network + Application Visibility Model

(Python Socket)

Push Data Push Data Push Data

Application Logs

PTP Grandmaster(OPTIONAL)

Analyze

AP

I to

App

licat

ion

Info

Synchronize Tim

e

github.com/datacenter

Page 27: Cisco Hadoop Summit 2013

27

Multi-tenant Environments

Page 28: Cisco Hadoop Summit 2013

28

Hadoop + HBASE

Job Based

Department Based

Various Multitenant Environments

Need to understand Traffic Patterns

Scheduling Dependent

Permissions and Scheduling Dependent

Page 29: Cisco Hadoop Summit 2013

29

Map 1 Map 2 Map NMap 3

Reducer 1

Reducer 2

Reducer 3

Reducer N

HDFS

Shuffle

Output Replication

Region Server

Region Server

Client Client

Major Compaction

ReadRead

Read

Update

Update

Read

Major Compaction

Page 30: Cisco Hadoop Summit 2013

30

Hbase During Major Compaction

Read/Update Latency

Comparison of Non-QoS vs. QoS Policy

~45% for Read Improvement

Switch Buffer Usage

With Network QoS Policy to prioritize

Hbase Update/Read Operations

Page 31: Cisco Hadoop Summit 2013

Switch Buffer Usage

With Network QoS Policy to prioritize

Hbase Update/Read Operations

Hbase + Hadoop Map Reduce

Read/Update Latency

Comparison of Non-QoS vs. QoS Policy

~60% for Read Improvement

Page 32: Cisco Hadoop Summit 2013

Cisco Unified Data Center

UNIFIEDFABRIC

UNIFIED COMPUTING

Highly Scalable, Secure Network Fabric

Modular StatelessComputing Elements

UNIFIED MANAGEMENT

AutomatedManagement

THANK YOU FOR LISTENING

www.cisco.com/go/ucswww.cisco.com/go/nexushttp://www.cisco.com/go/workloadautomation

Manages Enterprise Workloads

Cisco.com Big Datawww.cisco.com/go/bigdata

Data Center Script Examples from Presentation:

github.com/datacenter