Download - Distributed Machine Learning: An Intro. - UESTCdm.uestc.edu.cn/wp-content/uploads/seminar/Distributed ML.pdf · Distributed Machine Learning Frameworks 36 MapReduce ‒Synchronous

Transcript
Page 1: Distributed Machine Learning: An Intro. - UESTCdm.uestc.edu.cn/wp-content/uploads/seminar/Distributed ML.pdf · Distributed Machine Learning Frameworks 36 MapReduce ‒Synchronous

Chen Huang

Distributed Machine Learning: An Intro.

Feature Engineering Group,

Data Mining Lab,

Big Data Research Center, UESTC

Page 2: Distributed Machine Learning: An Intro. - UESTCdm.uestc.edu.cn/wp-content/uploads/seminar/Distributed ML.pdf · Distributed Machine Learning Frameworks 36 MapReduce ‒Synchronous

Contents

2

‒ Background

‒ Some Examples

‒ Model Parallelism & Data Parallelism

‒ Parallelization Mechanisms Synchronous

Asynchronous

‒ Parallelization Frameworks MPI / AllReduce / MapReduce / Parameter Server

GraphLab / Spark GraphX

Page 3: Distributed Machine Learning: An Intro. - UESTCdm.uestc.edu.cn/wp-content/uploads/seminar/Distributed ML.pdf · Distributed Machine Learning Frameworks 36 MapReduce ‒Synchronous

Background

3

Why Distributed ML?

• Big Data Problem

Efficient Algorithm

Online Learning / Data Stream• Feasible.

• What about high dimension?

Distributed Machine• The more, the merrier

Page 4: Distributed Machine Learning: An Intro. - UESTCdm.uestc.edu.cn/wp-content/uploads/seminar/Distributed ML.pdf · Distributed Machine Learning Frameworks 36 MapReduce ‒Synchronous

Background

4

Why Distributed ML?

• Big Data• Efficient Algorithm

• Online Learning / Data Stream

• Distributed Machine

• Big Model• Model Split

• Model Distributed

Page 5: Distributed Machine Learning: An Intro. - UESTCdm.uestc.edu.cn/wp-content/uploads/seminar/Distributed ML.pdf · Distributed Machine Learning Frameworks 36 MapReduce ‒Synchronous

Background

5

Distributed Machine Learning

• Big model over big data

Page 6: Distributed Machine Learning: An Intro. - UESTCdm.uestc.edu.cn/wp-content/uploads/seminar/Distributed ML.pdf · Distributed Machine Learning Frameworks 36 MapReduce ‒Synchronous

Background

Overview

6

Distributed Machine Learning

‒ Motivation

‒ Big model over big data

‒ DML

‒ Multiple workers cooperate each other with

communication

‒ Target

‒ Get the job done (convergence, …)

‒ Min communication cost (IO, …)

‒ Max effect (Time, performance…)

Page 7: Distributed Machine Learning: An Intro. - UESTCdm.uestc.edu.cn/wp-content/uploads/seminar/Distributed ML.pdf · Distributed Machine Learning Frameworks 36 MapReduce ‒Synchronous

Example

7

K-means

Page 8: Distributed Machine Learning: An Intro. - UESTCdm.uestc.edu.cn/wp-content/uploads/seminar/Distributed ML.pdf · Distributed Machine Learning Frameworks 36 MapReduce ‒Synchronous

Example

8

Distributed K-means

Page 9: Distributed Machine Learning: An Intro. - UESTCdm.uestc.edu.cn/wp-content/uploads/seminar/Distributed ML.pdf · Distributed Machine Learning Frameworks 36 MapReduce ‒Synchronous

Example

9

Spark K-means

Page 10: Distributed Machine Learning: An Intro. - UESTCdm.uestc.edu.cn/wp-content/uploads/seminar/Distributed ML.pdf · Distributed Machine Learning Frameworks 36 MapReduce ‒Synchronous

Example

10

Spark K-means

Page 11: Distributed Machine Learning: An Intro. - UESTCdm.uestc.edu.cn/wp-content/uploads/seminar/Distributed ML.pdf · Distributed Machine Learning Frameworks 36 MapReduce ‒Synchronous

Example

11

Spark K-means

Page 12: Distributed Machine Learning: An Intro. - UESTCdm.uestc.edu.cn/wp-content/uploads/seminar/Distributed ML.pdf · Distributed Machine Learning Frameworks 36 MapReduce ‒Synchronous

Example

12

Item filter

‒ Given two files, you need to output key-value

pairs in file B, whose key exists in file A.

‒ File B is super large. (e.g. 100GB)

‒ What if A is also super large?

Key1Key12Key3Key5…

Key1, val1Key2, val2Key4, val4Key3, val3Key5, val5…

A B

Page 13: Distributed Machine Learning: An Intro. - UESTCdm.uestc.edu.cn/wp-content/uploads/seminar/Distributed ML.pdf · Distributed Machine Learning Frameworks 36 MapReduce ‒Synchronous

Example

13

Item filter

Key1Key12Key3Key5…

Key1, val1Key2, val2Key4, val4Key3, val3Key5, val5…

A

B

Key1

Key12Key3

Key5

…….

Key1, val1

Key2, val2Key4, val4

Key3, val3Key5, val5…

………, …….

Page 14: Distributed Machine Learning: An Intro. - UESTCdm.uestc.edu.cn/wp-content/uploads/seminar/Distributed ML.pdf · Distributed Machine Learning Frameworks 36 MapReduce ‒Synchronous

Example

14

Item filter

Key1Key12Key3Key5…

Key1, val1Key2, val2Key4, val4Key3, val3Key5, val5…

A

B

Key1Key3 Key12

Key5 …….

Key1, val1Key2, val2Key3, val3

Key4, val4Key5, val5

Key7, val7………, …….

Hash

Hash

Page 15: Distributed Machine Learning: An Intro. - UESTCdm.uestc.edu.cn/wp-content/uploads/seminar/Distributed ML.pdf · Distributed Machine Learning Frameworks 36 MapReduce ‒Synchronous

Distributed Machine Learning

15

Overview

* AAAI 2017 Workshop on Distributed Machine Learning for more information

Page 16: Distributed Machine Learning: An Intro. - UESTCdm.uestc.edu.cn/wp-content/uploads/seminar/Distributed ML.pdf · Distributed Machine Learning Frameworks 36 MapReduce ‒Synchronous

Distributed Machine Learning

How To Distribute

16

Key Problems

‒ How to “split”‒ Data parallelism / model parallelism

‒ Data / Parameters dependency

‒ How to aggregate messages‒ Parallelization mechanisms

‒ Consensus between local & global parameters

‒ Does algorithm converge

‒ Other concerns‒ Communication cost, …

Page 17: Distributed Machine Learning: An Intro. - UESTCdm.uestc.edu.cn/wp-content/uploads/seminar/Distributed ML.pdf · Distributed Machine Learning Frameworks 36 MapReduce ‒Synchronous

Distributed Machine Learning

How To Split

17

How To Distribute

‒ Data Parallelism

1. Data partition

2. Parallel training

3. Combine local updates

4. Refresh local model with

new parameters

Page 18: Distributed Machine Learning: An Intro. - UESTCdm.uestc.edu.cn/wp-content/uploads/seminar/Distributed ML.pdf · Distributed Machine Learning Frameworks 36 MapReduce ‒Synchronous

Distributed Machine Learning

How To Split

18

How To Distribute

‒ Model Parallelism

1. Partition model into

multiple local workers

2. Workers collaborate

with each other to

perform optimization

Page 19: Distributed Machine Learning: An Intro. - UESTCdm.uestc.edu.cn/wp-content/uploads/seminar/Distributed ML.pdf · Distributed Machine Learning Frameworks 36 MapReduce ‒Synchronous

Distributed Machine Learning

How To Split

19

How To Distribute

‒ Model Parallelism & Data Parallelism

Example: Distributed Logistic Regression

Page 20: Distributed Machine Learning: An Intro. - UESTCdm.uestc.edu.cn/wp-content/uploads/seminar/Distributed ML.pdf · Distributed Machine Learning Frameworks 36 MapReduce ‒Synchronous

Distributed Machine Learning

How To Split

20

Categories

‒ Data Parallelism‒ Split data into many samples sets

‒ Workers calculate the same parameter(s) on different

sample set

‒ Model Parallelism‒ Split model/parameter

‒ Workers calculate different parameter(s) on the same

data set

‒ Hybrid Parallelism

Page 21: Distributed Machine Learning: An Intro. - UESTCdm.uestc.edu.cn/wp-content/uploads/seminar/Distributed ML.pdf · Distributed Machine Learning Frameworks 36 MapReduce ‒Synchronous

Distributed Machine Learning

How To Split

21

Data / Parameter Split

‒ Data Allocation‒ Random selection. (Shuffling)

‒ Partition. (e.g. Item filter, word count)

‒ Sampling

‒ Parallel graph calculation (for non-i.i.d. data)

‒ Parameter Split‒ Most algorithms assume parameter independent and

randomly split parameters

‒ Petuum (KDD’15, Eric Xing)

Page 22: Distributed Machine Learning: An Intro. - UESTCdm.uestc.edu.cn/wp-content/uploads/seminar/Distributed ML.pdf · Distributed Machine Learning Frameworks 36 MapReduce ‒Synchronous

Distributed Machine Learning

How To Aggregate Messages

22

Parallelization Mechanisms

• Given the feedback 𝑔𝑖 𝑤 of worker 𝑖, how can we

update the model parameter 𝑊?

𝑊 = 𝑓 𝑔1 𝑤 , 𝑔2 𝑤 ,… , 𝑔𝑚 𝑤

Page 23: Distributed Machine Learning: An Intro. - UESTCdm.uestc.edu.cn/wp-content/uploads/seminar/Distributed ML.pdf · Distributed Machine Learning Frameworks 36 MapReduce ‒Synchronous

Distributed Machine Learning

Parallelization Mechanism

23

Bulk Synchronous Parallel (BSP)

‒ Synchronous update‒ Update parameter until all workers are done with

their job

‒ Example: Sync SGD (Mini-batch SGD) , Hadoop

Page 24: Distributed Machine Learning: An Intro. - UESTCdm.uestc.edu.cn/wp-content/uploads/seminar/Distributed ML.pdf · Distributed Machine Learning Frameworks 36 MapReduce ‒Synchronous

Distributed Machine Learning

24

Sync SGD

‒ Perceptron

Page 25: Distributed Machine Learning: An Intro. - UESTCdm.uestc.edu.cn/wp-content/uploads/seminar/Distributed ML.pdf · Distributed Machine Learning Frameworks 36 MapReduce ‒Synchronous

Distributed Machine Learning

25

Sync SGD

𝑊 ← 𝑊 − 𝛻𝑊𝑖

𝑊 𝑊 𝑊𝛻𝑊1 𝛻𝑊2 𝛻𝑊3 = −

𝑥𝑖∈𝑀

𝑥𝑖𝑦𝑖

Page 26: Distributed Machine Learning: An Intro. - UESTCdm.uestc.edu.cn/wp-content/uploads/seminar/Distributed ML.pdf · Distributed Machine Learning Frameworks 36 MapReduce ‒Synchronous

Distributed Machine Learning

Parallelization Mechanism

26

Asynchronous Parallel

‒ Asynchronous update‒ Update parameter whenever received the feedback of

workers

‒ Example: Downpour SGD (NIPS’12)

Page 27: Distributed Machine Learning: An Intro. - UESTCdm.uestc.edu.cn/wp-content/uploads/seminar/Distributed ML.pdf · Distributed Machine Learning Frameworks 36 MapReduce ‒Synchronous

Distributed Machine Learning

27

Downpour SGD

Page 28: Distributed Machine Learning: An Intro. - UESTCdm.uestc.edu.cn/wp-content/uploads/seminar/Distributed ML.pdf · Distributed Machine Learning Frameworks 36 MapReduce ‒Synchronous

Distributed Machine Learning

28

Async. V.S. Sync.

‒ Sync.‒ Single point of failure: it has to wait until all workers

finished his job. The overall efficiency of algorithm

is determinated by the slowest worker.

‒ Nice convergence

‒ Async.‒ Very fast!

‒ Affect the convergence of algorithm. (e.g. expired

gradient)

‒ Use it, if model is not sensitive to async. update

Page 29: Distributed Machine Learning: An Intro. - UESTCdm.uestc.edu.cn/wp-content/uploads/seminar/Distributed ML.pdf · Distributed Machine Learning Frameworks 36 MapReduce ‒Synchronous

Distributed Machine Learning

Parallelization Mechanism

29

ADMM for DML

‒ Alternating Direction Method of Multipliers

‒ Augmented Lagrangian + Dual Decomposition

‒ Famous optimization algorithm for both industrial and

academic. (e.g. computing advertising)

For DML case: replace 𝑥2𝑘−1 with

𝑚𝑒𝑎𝑛(𝑥2𝑘−1) and 𝑥1

𝑘 with

mean 𝑥1𝑘 when updating

Page 30: Distributed Machine Learning: An Intro. - UESTCdm.uestc.edu.cn/wp-content/uploads/seminar/Distributed ML.pdf · Distributed Machine Learning Frameworks 36 MapReduce ‒Synchronous

Distributed Machine Learning

Parallelization Mechanisms

30

Overview

‒ Sync.

‒ Async

‒ ADMM

‒ Model Average

‒ Elastic Averaging SGD (NIPS’15)

‒ Lock Free: Hogwild! (NIPS’11)

‒ ……

Page 31: Distributed Machine Learning: An Intro. - UESTCdm.uestc.edu.cn/wp-content/uploads/seminar/Distributed ML.pdf · Distributed Machine Learning Frameworks 36 MapReduce ‒Synchronous

31

Distributed ML Framework

Page 32: Distributed Machine Learning: An Intro. - UESTCdm.uestc.edu.cn/wp-content/uploads/seminar/Distributed ML.pdf · Distributed Machine Learning Frameworks 36 MapReduce ‒Synchronous

Distributed Machine Learning

Frameworks

32

This is a joke, please laugh…

Page 33: Distributed Machine Learning: An Intro. - UESTCdm.uestc.edu.cn/wp-content/uploads/seminar/Distributed ML.pdf · Distributed Machine Learning Frameworks 36 MapReduce ‒Synchronous

Distributed Machine Learning

Frameworks

33

Message Passing Interface (MPI)

‒ Parallel computing architecture

‒ Many operations:

‒ send, receive, broadcast, scatter, gather…

Page 34: Distributed Machine Learning: An Intro. - UESTCdm.uestc.edu.cn/wp-content/uploads/seminar/Distributed ML.pdf · Distributed Machine Learning Frameworks 36 MapReduce ‒Synchronous

Distributed Machine Learning

Frameworks

34

Message Passing Interface (MPI)

‒ Parallel computing architecture

‒ Many operations:

‒ AllReduce = reduce + broadcast

‒ Hard to write code!

Page 35: Distributed Machine Learning: An Intro. - UESTCdm.uestc.edu.cn/wp-content/uploads/seminar/Distributed ML.pdf · Distributed Machine Learning Frameworks 36 MapReduce ‒Synchronous

Distributed Machine Learning

Frameworks

35

MapReduce

‒ Well-encapsulated code, user-friendly!

‒ Designed scheduler,

‒ Integration with HDFS / fault-tolerant /….

Page 36: Distributed Machine Learning: An Intro. - UESTCdm.uestc.edu.cn/wp-content/uploads/seminar/Distributed ML.pdf · Distributed Machine Learning Frameworks 36 MapReduce ‒Synchronous

Distributed Machine Learning

Frameworks

36

MapReduce

‒ Synchronous parallel, single point of failure.

‒ 数据溢写 (I don’t know how to translate…)

‒ Not so suitable for machine learning task.

‒ Many ML models are solved in iterative manner, and

Hadoop/MapReduce does not naturally support

iteration calculation

‒ Spark does

‒ Iterative MapReduce Style Machine Learning Toolkits

‒ Hadoop Mahout

‒ Spark MLlib

Page 37: Distributed Machine Learning: An Intro. - UESTCdm.uestc.edu.cn/wp-content/uploads/seminar/Distributed ML.pdf · Distributed Machine Learning Frameworks 36 MapReduce ‒Synchronous

Distributed Machine Learning

Frameworks

37

GraphLab (UAI’10, VLDB’12)

‒ Distributed computing framework for graph

‒ Split graph into sub-graphs by node cut

‒ Asynchronous parallel

Page 38: Distributed Machine Learning: An Intro. - UESTCdm.uestc.edu.cn/wp-content/uploads/seminar/Distributed ML.pdf · Distributed Machine Learning Frameworks 36 MapReduce ‒Synchronous

Distributed Machine Learning

Frameworks

38

GraphLab (UAI’10, VLDB’12)

‒ Data Graph + Update Function + Sync Operation

‒ Data Graph

‒ Update function: user-defined function, working on

scopes

‒ Sync:global parameter update

Scope allows overlapping

Page 39: Distributed Machine Learning: An Intro. - UESTCdm.uestc.edu.cn/wp-content/uploads/seminar/Distributed ML.pdf · Distributed Machine Learning Frameworks 36 MapReduce ‒Synchronous

Distributed Machine Learning

Frameworks

39

GraphLab (UAI’10, VLDB’12)

‒ Data Graph + Update Function + Sync Operation

‒ Three Steps = Gather + Apply + Scatter

Read Only Write Node Only Write Edge Only

Page 40: Distributed Machine Learning: An Intro. - UESTCdm.uestc.edu.cn/wp-content/uploads/seminar/Distributed ML.pdf · Distributed Machine Learning Frameworks 36 MapReduce ‒Synchronous

Distributed Machine Learning

Frameworks

40

GraphLab: Consistency Control

‒ Trade-off between conflict and parallelization

Scope内不能读写Scope内,除了邻居节点,都不能读写

一次更新中,其他操作不能读写该节点

Page 41: Distributed Machine Learning: An Intro. - UESTCdm.uestc.edu.cn/wp-content/uploads/seminar/Distributed ML.pdf · Distributed Machine Learning Frameworks 36 MapReduce ‒Synchronous

Distributed Machine Learning

Frameworks

41

Spark GraphX

‒ Avoid the cost of moving sub-graphs among workers by

combining Table view & Graph view

Page 42: Distributed Machine Learning: An Intro. - UESTCdm.uestc.edu.cn/wp-content/uploads/seminar/Distributed ML.pdf · Distributed Machine Learning Frameworks 36 MapReduce ‒Synchronous

Distributed Machine Learning

Frameworks

42

Spark GraphX

‒ Avoid the cost of moving sub-graphs among workers by

combining Table view & Graph view

Page 43: Distributed Machine Learning: An Intro. - UESTCdm.uestc.edu.cn/wp-content/uploads/seminar/Distributed ML.pdf · Distributed Machine Learning Frameworks 36 MapReduce ‒Synchronous

Distributed Machine Learning

Frameworks

43

Parameter Server

‒ Asynchronous parallel

1. Workers query for current parameters

2. Parameters are stored in distributed way,

among server nodes

Workers calculate

partial parameters

Page 44: Distributed Machine Learning: An Intro. - UESTCdm.uestc.edu.cn/wp-content/uploads/seminar/Distributed ML.pdf · Distributed Machine Learning Frameworks 36 MapReduce ‒Synchronous

Distributed Machine Learning

Frameworks

44

Parameter Server

‒ Asynchronous parallel

Page 45: Distributed Machine Learning: An Intro. - UESTCdm.uestc.edu.cn/wp-content/uploads/seminar/Distributed ML.pdf · Distributed Machine Learning Frameworks 36 MapReduce ‒Synchronous

Distributed Machine Learning

45

DML Trends Overview

• For more information, please go to:

AAAI-17 Tutorial on Distributed Machine Learning

Page 46: Distributed Machine Learning: An Intro. - UESTCdm.uestc.edu.cn/wp-content/uploads/seminar/Distributed ML.pdf · Distributed Machine Learning Frameworks 36 MapReduce ‒Synchronous

Distributed Machine Learning

Take Home Message

46

‒ How to “split”‒ Data parallelism / model parallelism

‒ Data / Parameters dependency

‒ How to aggregate messages‒ Parallelization mechanisms

‒ Consensus between local & global parameters

‒ Does algorithm converge

‒ Frameworks

Page 47: Distributed Machine Learning: An Intro. - UESTCdm.uestc.edu.cn/wp-content/uploads/seminar/Distributed ML.pdf · Distributed Machine Learning Frameworks 36 MapReduce ‒Synchronous

Thanks