Multi-core Structural SVM Training Kai-Wei Chang Department of Computer Science University of...
-
Upload
mae-bennett -
Category
Documents
-
view
215 -
download
0
Transcript of Multi-core Structural SVM Training Kai-Wei Chang Department of Computer Science University of...
![Page 1: Multi-core Structural SVM Training Kai-Wei Chang Department of Computer Science University of Illinois at Urbana-Champaign Joint Work With Vivek Srikumar.](https://reader036.fdocuments.in/reader036/viewer/2022062423/5697bfc21a28abf838ca4b30/html5/thumbnails/1.jpg)
1
Multi-core Structural SVM Training
Kai-Wei ChangDepartment of Computer ScienceUniversity of Illinois at Urbana-Champaign
Joint Work With Vivek Srikumar and Dan Roth
![Page 2: Multi-core Structural SVM Training Kai-Wei Chang Department of Computer Science University of Illinois at Urbana-Champaign Joint Work With Vivek Srikumar.](https://reader036.fdocuments.in/reader036/viewer/2022062423/5697bfc21a28abf838ca4b30/html5/thumbnails/2.jpg)
2
Decisions are structured in many applications. Global decisions in which several local decisions play a role but there
are mutual dependencies on their outcome. It is essential to make coherent decisions in a way that takes the
interdependencies into account. E.g., Part-of-Speech tagging (sequential labeling)
Input: a sequence of words Output: Part-of-speech tags {NN, VBZ, …}
“A cat chases a mouse” => “DT NN VBZ DT NN”. Assignment to can depend on both and . Feature vector defined on both input and output variables:
can be “: Cat”, “ VBZ-VBZ”.
Motivation
![Page 3: Multi-core Structural SVM Training Kai-Wei Chang Department of Computer Science University of Illinois at Urbana-Champaign Joint Work With Vivek Srikumar.](https://reader036.fdocuments.in/reader036/viewer/2022062423/5697bfc21a28abf838ca4b30/html5/thumbnails/3.jpg)
3
Inference with General Constraint Structure [Roth&Yih’04,07]
Recognizing Entities and Relations
Dole ’s wife, Elizabeth , is a native of N.C.
E1 E2 E3
R12 R2
3
other 0.05
per 0.85
loc 0.10
other 0.05
per 0.50
loc 0.45
other 0.10
per 0.60
loc 0.30
irrelevant 0.10
spouse_of 0.05
born_in 0.85
irrelevant 0.05
spouse_of 0.45
born_in 0.50
irrelevant 0.05
spouse_of 0.45
born_in 0.50
other 0.05
per 0.85
loc 0.10
other 0.10
per 0.60
loc 0.30
other 0.05
per 0.50
loc 0.45
irrelevant 0.05
spouse_of 0.45
born_in 0.50
irrelevant 0.10
spouse_of 0.05
born_in 0.85
other 0.05
per 0.50
loc 0.45
Improvement over no inference: 2-5%
![Page 4: Multi-core Structural SVM Training Kai-Wei Chang Department of Computer Science University of Illinois at Urbana-Champaign Joint Work With Vivek Srikumar.](https://reader036.fdocuments.in/reader036/viewer/2022062423/5697bfc21a28abf838ca4b30/html5/thumbnails/4.jpg)
4
Structured prediction: predicting a structured output variable y based on the input variable x. variables form a structure: sequences, clusters, trees, or arbitrary
graphs. Structure comes from interactions between the output variables
through mutual correlations and constraints. TODAY:
How to efficiently learn models that are used to make global decisions. We focus on training a structural SVM model. Various approaches have been proposed [Joachims et. al. 09, Chang
and Yih 13, Lacoste-Julien et. al. 13] but they are single-threaded. DEMI-DCD: a multi-threaded algorithm for training structural SVM.
Structured Learning and Inference
![Page 5: Multi-core Structural SVM Training Kai-Wei Chang Department of Computer Science University of Illinois at Urbana-Champaign Joint Work With Vivek Srikumar.](https://reader036.fdocuments.in/reader036/viewer/2022062423/5697bfc21a28abf838ca4b30/html5/thumbnails/5.jpg)
5
Structural SVM: Inference and Learning DEMI-DCD for Structural SVM Related Work Experiments Conclusions
Outline
![Page 6: Multi-core Structural SVM Training Kai-Wei Chang Department of Computer Science University of Illinois at Urbana-Champaign Joint Work With Vivek Srikumar.](https://reader036.fdocuments.in/reader036/viewer/2022062423/5697bfc21a28abf838ca4b30/html5/thumbnails/6.jpg)
6
Structural SVM: Inference and Learning DEMI-DCD for Structural SVM Related Work Experiments Conclusions
Outline
![Page 7: Multi-core Structural SVM Training Kai-Wei Chang Department of Computer Science University of Illinois at Urbana-Champaign Joint Work With Vivek Srikumar.](https://reader036.fdocuments.in/reader036/viewer/2022062423/5697bfc21a28abf838ca4b30/html5/thumbnails/7.jpg)
7
Features vector on input-output
Inference constitutes predicting the best scoring structure. Efficient inference algorithms have been proposed for some
specific structures. Integer linear programing (ILP) solver can deal with general
structures.
Set of allowed structuresoften specified by constraints
Weight parameters (to be estimated during learning)
Features on input-output
Structured Prediction: Inference
![Page 8: Multi-core Structural SVM Training Kai-Wei Chang Department of Computer Science University of Illinois at Urbana-Champaign Joint Work With Vivek Srikumar.](https://reader036.fdocuments.in/reader036/viewer/2022062423/5697bfc21a28abf838ca4b30/html5/thumbnails/8.jpg)
8
Given a set of training examples . Solve the following optimization problem to learn w
is the scoring function used in inference. We use (L2-loss structural SVM).
Structural SVM
Score of gold structure
Score ofpredicted structure
Loss function
Slack variable
For all samples and feasible structures
![Page 9: Multi-core Structural SVM Training Kai-Wei Chang Department of Computer Science University of Illinois at Urbana-Champaign Joint Work With Vivek Srikumar.](https://reader036.fdocuments.in/reader036/viewer/2022062423/5697bfc21a28abf838ca4b30/html5/thumbnails/9.jpg)
9
Quadratic programming with bounded constraints
where Number of variables can be exponentially large. Relationship between and
For linear model: maintain the relationship between and though out the learning process [Hsieh et.al. 08].
Dual Problem of Structural SVM
Corresponds to different
![Page 10: Multi-core Structural SVM Training Kai-Wei Chang Department of Computer Science University of Illinois at Urbana-Champaign Joint Work With Vivek Srikumar.](https://reader036.fdocuments.in/reader036/viewer/2022062423/5697bfc21a28abf838ca4b30/html5/thumbnails/10.jpg)
10
Maintain an active set of dual variables: Identify that will be non-zero in the end of optimization process.
In a single-thread implementation, training consists of two phases: Select and maintain (active set selection step).
Require solving a loss-augmented inference problem for each
Solving loss-augmented inferences is usually the bottleneck. Update the values of (learning step).
Require (approximately) solving a sub-problem. Related to cutting-plane methods.
Active Set
![Page 11: Multi-core Structural SVM Training Kai-Wei Chang Department of Computer Science University of Illinois at Urbana-Champaign Joint Work With Vivek Srikumar.](https://reader036.fdocuments.in/reader036/viewer/2022062423/5697bfc21a28abf838ca4b30/html5/thumbnails/11.jpg)
11
Structural SVM: Inference and Learning DEMI-DCD for Structural SVM Related Work Experiments Conclusions
Outline
![Page 12: Multi-core Structural SVM Training Kai-Wei Chang Department of Computer Science University of Illinois at Urbana-Champaign Joint Work With Vivek Srikumar.](https://reader036.fdocuments.in/reader036/viewer/2022062423/5697bfc21a28abf838ca4b30/html5/thumbnails/12.jpg)
DEMI-DCD: Decouple Model-update and Inference with Dual Coordinate Descent. Let be #threads, and split training data into Active set selection (Inference) thread : select and maintain
the active set for each example in Learning thread: loop over all examples and update model .
and are shared between threads using shared memory buffers.
Overview of DEMI-DCD
Learning
Active Set Selection
12
![Page 13: Multi-core Structural SVM Training Kai-Wei Chang Department of Computer Science University of Illinois at Urbana-Champaign Joint Work With Vivek Srikumar.](https://reader036.fdocuments.in/reader036/viewer/2022062423/5697bfc21a28abf838ca4b30/html5/thumbnails/13.jpg)
Sequentially visit each instance and update To update , solve the following one variable sub-problem
Then and Shrinking heuristic: remove from if = 0 and
Learning Thread
13
![Page 14: Multi-core Structural SVM Training Kai-Wei Chang Department of Computer Science University of Illinois at Urbana-Champaign Joint Work With Vivek Srikumar.](https://reader036.fdocuments.in/reader036/viewer/2022062423/5697bfc21a28abf838ca4b30/html5/thumbnails/14.jpg)
DEMI-DCD requires little synchronization. Only learning thread can write , and only one active set
selection thread can modify . Each thread maintains a local copy of and copies to/from a
shared buffer after iterations.
Synchronization
14
![Page 15: Multi-core Structural SVM Training Kai-Wei Chang Department of Computer Science University of Illinois at Urbana-Champaign Joint Work With Vivek Srikumar.](https://reader036.fdocuments.in/reader036/viewer/2022062423/5697bfc21a28abf838ca4b30/html5/thumbnails/15.jpg)
15
Structural SVM: Inference and Learning DEMI-DCD for Structural SVM Related Work Experiments Conclusions
Outline
![Page 16: Multi-core Structural SVM Training Kai-Wei Chang Department of Computer Science University of Illinois at Urbana-Champaign Joint Work With Vivek Srikumar.](https://reader036.fdocuments.in/reader036/viewer/2022062423/5697bfc21a28abf838ca4b30/html5/thumbnails/16.jpg)
A Master-Slave architecture (MS-DCD): Given processors: split data into parts. At each iterations:
Master sends current model to slave threads. Each slave thread solves loss-augmented inference problems
associated with a data block and updates the active set. After all slave threads finish, master thread updates the model
according to the active set. Implemented in JLIS.
A parallel Dual Coordinate Descent Algorithm
Master
Slave Slave Slave Slave
Sent current w
Solve loss-augmented inferenceand update A
Master Update w based on A
16
![Page 17: Multi-core Structural SVM Training Kai-Wei Chang Department of Computer Science University of Illinois at Urbana-Champaign Joint Work With Vivek Srikumar.](https://reader036.fdocuments.in/reader036/viewer/2022062423/5697bfc21a28abf838ca4b30/html5/thumbnails/17.jpg)
Structured Perceptron [Collins 02]: At each iteration: pick an example and find the best
structured output according to the current model . Then update with a learning rate
SP-IPM [McDonald et al 10]:1. Split data into parts.2. Train Structured Perceptron on each data block in parallel.3. Mixed the models using a linear combination.4. Repeat Step 2 and use the mixed model as the initial model.
Structured Perceptron and its Parallel Version
17
![Page 18: Multi-core Structural SVM Training Kai-Wei Chang Department of Computer Science University of Illinois at Urbana-Champaign Joint Work With Vivek Srikumar.](https://reader036.fdocuments.in/reader036/viewer/2022062423/5697bfc21a28abf838ca4b30/html5/thumbnails/18.jpg)
18
Structural SVM: Inference and Learning DEMI-DCD for Structural SVM Related Work Experiments Conclusions
Outline
![Page 19: Multi-core Structural SVM Training Kai-Wei Chang Department of Computer Science University of Illinois at Urbana-Champaign Joint Work With Vivek Srikumar.](https://reader036.fdocuments.in/reader036/viewer/2022062423/5697bfc21a28abf838ca4b30/html5/thumbnails/19.jpg)
19
Experiment Settings
POS tagging (POS-WSJ): Assign POS label to each word in a sentence. We use standard Penn Treebank Wall Street Journal corpus with
39,832 sentences. Entity and Relation Recognition (Entity-Relation):
Assign entity types to mentions and identify relations among them. 5,925 training samples. Inference is solved by an ILP solver.
We compare the following methods: DEMI-DCD: the proposed method. MS-DCD: A master-slave style parallel implementation of DCD. SP-IPM: parallel structured Perceptron.
![Page 20: Multi-core Structural SVM Training Kai-Wei Chang Department of Computer Science University of Illinois at Urbana-Champaign Joint Work With Vivek Srikumar.](https://reader036.fdocuments.in/reader036/viewer/2022062423/5697bfc21a28abf838ca4b30/html5/thumbnails/20.jpg)
20
Convergence on Primal Function ValueRelative primal function value difference along training time
POS-WSJ Entity-RelationLog-scale
![Page 21: Multi-core Structural SVM Training Kai-Wei Chang Department of Computer Science University of Illinois at Urbana-Champaign Joint Work With Vivek Srikumar.](https://reader036.fdocuments.in/reader036/viewer/2022062423/5697bfc21a28abf838ca4b30/html5/thumbnails/21.jpg)
21
Test PerformanceTest Performance along training time
POS-WSJ
SP-IPM converges to a different model
![Page 22: Multi-core Structural SVM Training Kai-Wei Chang Department of Computer Science University of Illinois at Urbana-Champaign Joint Work With Vivek Srikumar.](https://reader036.fdocuments.in/reader036/viewer/2022062423/5697bfc21a28abf838ca4b30/html5/thumbnails/22.jpg)
22
Test PerformanceTest Performance along training time Entity-Relation Task
Entity F1 Relation F1
![Page 23: Multi-core Structural SVM Training Kai-Wei Chang Department of Computer Science University of Illinois at Urbana-Champaign Joint Work With Vivek Srikumar.](https://reader036.fdocuments.in/reader036/viewer/2022062423/5697bfc21a28abf838ca4b30/html5/thumbnails/23.jpg)
23
Moving Average of CPU Usage
POS-WSJ Entity-Relation
DEMI-DCD fully utilizes CPU power
CPU usage drops because of the synchronization
![Page 24: Multi-core Structural SVM Training Kai-Wei Chang Department of Computer Science University of Illinois at Urbana-Champaign Joint Work With Vivek Srikumar.](https://reader036.fdocuments.in/reader036/viewer/2022062423/5697bfc21a28abf838ca4b30/html5/thumbnails/24.jpg)
25
Structural SVM: Inference and Learning DEMI-DCD for Structural SVM Related Work Experiments Conclusions
Outline
![Page 25: Multi-core Structural SVM Training Kai-Wei Chang Department of Computer Science University of Illinois at Urbana-Champaign Joint Work With Vivek Srikumar.](https://reader036.fdocuments.in/reader036/viewer/2022062423/5697bfc21a28abf838ca4b30/html5/thumbnails/25.jpg)
26
Conclusion
We proposed DEMI-DCD for training structural SVM on multi-core machine.
The proposed method decouples the model update and inference phases of learning.
As a result, it can fully utilize all available processors to speed up learning.
Software will be available at: http://cogcomp.cs.illinois.edu/page/software
Thank you.