Location-aware MapReduce in Virtual Cloud 2011 IEEE computer society International Conference on...

Location-aware MapReduce in Virtual Cloud

2011 IEEE computer society

International Conference on Parallel Processing

Yifeng Geng1,2, Shimin Chen3, YongWei Wu1*, Ryan Wu3, Guangwen Yang1,2, Weimin Zheng1

Reporter： Yu Chih Lin

Outline

Introduction

Background

Model and New Strategy

Implementation

Experiment

Conclusion

Introduction

MapReduce is an important programming model

• Processing

• Generating large data sets

Commonly used in applications

• web indexing

• Data mining

• machine learning

Introduction

Multi-core CPU supporting virtualization technology

• Run two or more virtual machines (VMs) simultaneously

• Share the I/O resources to users

MapReduce is set up on a distributed file system

• Goolge uses GFS

• Hadoop uses HDFS

Introduction

In a virtual environmen runs MapReduce, three major problems

• Disk sharing results in unbalanced data distribution and unbalanced workload

• I/O interference caused by data unbalance and load unbalance

• Disk sharing reduces the data redundancy

Introduction

Purpose of this paper

• Abstract a model

• Define evaluation metrics

• Analyze the data pattern and task pattern

For Hadoop

• propose a location-aware file block allocation strategy

Introduction

Three main benefits by using this paper strategy

• MapReduce’s workload is more balanced

• Reduces I/O interference and improves HDFS’s performance

• Retains data’s redundancy

Background

I/O has two kinds of traditional interference

• Disk interference –

when multiple processes try to access the same disk simultaneously

• Network interference –

mainly considers the latency and throughput

Background

I/O virtualization has two kinds of virtualization

• KVM

• Paravirtualization

Virtual machines share CPUs and memory well, but not I/O.

Background

Virtualized Hadoop architecture

Build a generation model to analyze different allocation strategies

• Data pattern

• Task pattern

To simply the problem for analyzing, make the four assumptions

Using the same I/O devices hosts and number of virtual machines on each physical machine

All the virtual machines are in local area network and the network topology is flat

No limitation for workload to be randomly assigned to each virtual machine

All file blocks have the same size

actualReplicaNum (a) :

average number of unique file blocks in a physical machine

Ideal value is 3 (when thereplica number is 3)

maxBlockNum (b) :

shows the maximum number of blocks in a physical machine

blockNumSigma (c) :

shows the variation of the pattern

Idea value is 0

maxAssignedNum (d) :

shows the max number of task that a physical machine is assigned

assignedNumSigma (e) :

reveals the load balance of the task pattern

A new allocation strategy

• Replicas of a file block to different physical machines

• Keeps balance ofthe block number of each physical machines

Present two intuitive ways

• Round-robin allocation

• Serpentine allocation

For example , take p = 8 , n = 8 (p : physical machines , n : file blocks)

An example of round-robin allocation

For example , take p = 8 , n = 8(p : physical machines , n : file blocks)

An example of serpentine allocation

Evaluation metrics for data pattern

actualReplicaNum=3, maxBlockNum=3, blockNumSigma=0

Enumeration average results for task patterns

round-robin allocation as results:

maxAssignedNum=2.2724 , assignedNumSigma=0.7943

serpentine allocation as results:

maxAssignedNum=2.2705 , assignedNumSigma=0.79323

Implementation

Choose serpentine allocation

Add the location information of virtual node into the network topology

For example, one rack among the physical machines

• may be changed from /default-rack to /Phy0

For example, some rack among the physical machines

• may be changed from /rack1 to /rack1/Phy0

Implementation

Mechanism makes Hadoop easy

• It can keep compatibility with the native Hadoop

• Make special label starting with “ Phy ”

• Identify locations of virtual machines

Implementation

To maintain the block information for each virtual node

• In NameNode of Hadoop , add a sorted list by the number of blocks

In the update

• first update the block number of the virtual node

• Second update its position in the sorted list

Evaluation

Simulation to compare

• New strategy (serpentine allocation) and Hadoop’s original strategy

Set parameter

n = 256

p = [8,16,32,64,128,256]

sampling number is set to 1,000,000

Evaluation

maxBlockNum’s comparison of Hadoop’s original strategy and our new strategy using sampling

Evaluation

actualReplicaNum’s comparison original and new strategy

Evaluation

blockNumSigma’s comparison originals and new strategy

Evaluation

maxAssignedNum’s comparison original and new strategy

Evaluation

assignedNumSigma’s comparison original and new strategy

Experiment

N=224 , P=8

SAMPLING NUMBER=1,000,000

Original New

Average of actualReplicaNum 2.0657 3

Average of maxBlockNum 90.5798 84

Average of blockNumSigma 4.1722 0

Average of maxAssignedNum 33.7660 34.5946

Average of assignedNumSigma 3.6256 4.14939

Experiment

Experiment results of RandomWriter’s execution time

Red : SC offBlue : SC on

Experiment

Experiment results of TextSort’s execution time

Experiment

Experiment results of WordCount’s execution time

Conclusion

Address problems of data allocation and its impact on MapReduce system

Build a model and evaluation metrics to evaluate the data and task pattern

Propose a new strategy for file block allocation in Hadoop

Simulation and real experiments results

• prove the new allocation strategy is good

Location-aware MapReduce in Virtual Cloud 2011 IEEE computer society International Conference on...

Documents

Transcript of Location-aware MapReduce in Virtual Cloud 2011 IEEE computer society International Conference on...

Kevin SunHHS Public Access 1,* Xiao Li2,* Xing Chen3 Ying ...

arXiv:1608.01409v5 [cs.CV] 28 Jul 2017 · Hai Li3 Yiran Chen3 Pradeep Dubey1 1Intel Labs, 2Department of Electrical and Computing Engineering, University of Pittsburgh ... [cs.CV]

Compound Reinforcement Learning: Theory and An … Reinforcement Learning: Theory and An Application to Finance Tohgoroh Matsui1, Takashi Goto2, Kiyoshi Izumi3;4, and Yu Chen3 1Chubu

Hsin-Hsi Chen3-1 Chapter 3 Retrieval Evaluation Hsin-Hsi Chen Department of Computer Science and Information Engineering National Taiwan University.

Minimizing Cybersickness through Increased-Intensity ... · Minimizing Cybersickness through Increased-Intensity Habituation REU fellows: Erin Neaton1, Mohammed Baidas2, Jeffrey Chen3,

The High Energy cosmic-Radiation Detection (HERD) …The High Energy cosmic-Radiation Detection (HERD) Facility onboard China’s Future Space Station Yongwei DONG, on behalf of HERD

Pyramid Sketch: a Sketch Framework for Frequency ... · Pyramid Sketch: a Sketch Framework for Frequency Estimation of Data Streams Tong Yang1;2, Yang Zhou1, Hao Jin1, Shigang Chen3,

1 Improving Cluster Selection Techniques of Regression Testing by Slice Filtering Yongwei Duan, Zhenyu Chen, Zhihong Zhao, Ju Qian and Zhongjun Yang Software.

Quasi-Hamming Distances: An Overarching Concept for ... · An Overarching Concept for Measuring Glyph Similarity Philip A. Leggy1, Eamonn Maguire2, Simon Walton3, and Min Chen3 1University

InfoNice: Easy Creation of Information Graphicshuamin/chi_yun_2018.pdf · InfoNice: Easy Creation of Information Graphics Yun Wang1, Haidong Zhang 2, He Huang , Xi Chen3, Qiufeng

Tumor-targeted Nanoparticle Delivery of HuR siRNA Inhibits ...Ranganayaki Muralidharan1,2, Anish Babu1,2, Narsireddy Amreddy1,2, Akhil Srivastava1,2, Allshine Chen3,Yan Daniel Zhao2,3,

· Web viewEvaluation of FKBP4 as a malignant indicator in luminal A subtype breast cancer via bioinformatics analysis Hanchu Xiong1,2*, Zihan Chen3*, Linbo Wang1, Xiao-Fang Yu2#,

((Title)) · Web viewOne-step colorimetric genotyping of single nucleotide polymorphism using probe-enhanced loop-mediated isothermal amplification (PE-LAMP) Sheng Ding1,2, Rong Chen3,

National Key Laboratory of Shock Wave and … · Web viewThermodynamic anomalies and three distinct liquid-liquid transitions in warm dense liquid hydrogenHua Y. Geng1,2,*, Q. Wu1,

Multi-Task Learning: Theory, Algorithms, and …Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou1,2, Jianhui Chen3,

Rebalancing Bike Sharing Systems: A Multi-source Data ... · Rebalancing Bike Sharing Systems: A Multi-source Data Smart Optimization Junming Liu1, Leilei Sun2, Weiwei Chen3, Hui

VOLUME 8, ISSUE 4, JULY-AUGUST 2017repository.um-surabaya.ac.id/3251/5/7._Karya_Ilmiah_Scopus.pdf · Saleh Abd El-Aleem Mohammed El-Awney Dr. Yongwei shan Oklahoma state university,

MVP: Detecting Vulnerabilities using Patch-Enhanced ... · MVP: Detecting Vulnerabilities using Patch-Enhanced Vulnerability Signatures Yang Xiao1,2, Bihuan Chen3, Chendong Yu1,2,

POTENTIAL APPLICATIONS OF SHAPE MEMORY ALLOYS ... AMATO...SEISMIC RETROFITTING OF AN EXTERIOR REINFORCED CONCRETE BEAM-COLUMN JOINT Raj SUHAIL1, Giuseppina AMATO2, Jian-Fei CHEN3 and

Maternally inherited coronary heart disease is associated with ......Zhenxiao Zhang1, Mingyang Liu1, Jianshuai He2, Xiaotian Zhang2, Yuehua Chen3 and Hui Li1* Abstract Background:

· Web viewEvaluation of FKBP4 as a malignant indicator in luminal A subtype breast cancer via bioinformatics analysis Hanchu Xiong1,2, Zihan Chen3, Linbo Wang1, Xiao-Fang Yu2#,