Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer...

65
Learning to Branch Ellen Vitercik Joint work with Nina Balcan, Travis Dick, and Tuomas Sandholm Published in ICML 2018 1

Transcript of Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer...

Page 1: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize โˆ™๐’™ subject to ๐ด๐’™= ๐’™โˆˆ{0,1} . NP-hard.

Learning to BranchEllen Vitercik

Joint work with Nina Balcan, Travis Dick, and Tuomas Sandholm

Published in ICML 2018

1

Page 2: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize โˆ™๐’™ subject to ๐ด๐’™= ๐’™โˆˆ{0,1} . NP-hard.

Integer Programs (IPs)

a

maximize ๐’„ โˆ™ ๐’™subject to ๐ด๐’™ โ‰ค ๐’ƒ

๐’™ โˆˆ {0,1}๐‘›

2

Page 3: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize โˆ™๐’™ subject to ๐ด๐’™= ๐’™โˆˆ{0,1} . NP-hard.

Facility location problems can be formulated as IPs.

3

Page 4: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize โˆ™๐’™ subject to ๐ด๐’™= ๐’™โˆˆ{0,1} . NP-hard.

Clustering problems can be formulated as IPs.

4

Page 5: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize โˆ™๐’™ subject to ๐ด๐’™= ๐’™โˆˆ{0,1} . NP-hard.

Binary classification problems can be formulated as IPs.

5

Page 6: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize โˆ™๐’™ subject to ๐ด๐’™= ๐’™โˆˆ{0,1} . NP-hard.

Integer Programs (IPs)

a

maximize ๐’„ โˆ™ ๐’™subject to ๐ด๐’™ = ๐’ƒ

๐’™ โˆˆ {0,1}๐‘›

NP-hard

6

Page 7: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize โˆ™๐’™ subject to ๐ด๐’™= ๐’™โˆˆ{0,1} . NP-hard.

Branch and Bound (B&B)

โ€ข Most widely-used algorithm for IP-solving (CPLEX, Gurobi)

โ€ข Recursively partitions search space to find an optimal solutionโ€ข Organizes partition as a tree

โ€ข Many parametersโ€ข CPLEX has a 221-page manual describing 135 parameters

โ€œYou may need to experiment.โ€

7

Page 8: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize โˆ™๐’™ subject to ๐ด๐’™= ๐’™โˆˆ{0,1} . NP-hard.

Why is tuning B&B parameters important?

โ€ข Save timeโ€ข Solve more problems

โ€ข Find better solutions

8

Page 9: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize โˆ™๐’™ subject to ๐ด๐’™= ๐’™โˆˆ{0,1} . NP-hard.

B&B in the real world

Delivery company routes trucks dailyUse integer programming to select routes

Demand changes every daySolve hundreds of similar optimizations

Using this set of typical problemsโ€ฆ

can we learn best parameters?

9

Page 10: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize โˆ™๐’™ subject to ๐ด๐’™= ๐’™โˆˆ{0,1} . NP-hard.

Application-

Specific

DistributionAlgorithm

Designer

B&B parameters

Model

๐ด 1 , ๐’ƒ 1 , ๐’„ 1 , โ€ฆ , ๐ด ๐‘š , ๐’ƒ ๐‘š , ๐’„ ๐‘š

How to use samples to find best B&B parameters for my domain?

10

Page 11: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize โˆ™๐’™ subject to ๐ด๐’™= ๐’™โˆˆ{0,1} . NP-hard.

Model

Model has been studied in applied communities [Hutter et al. โ€˜09]

Application-

Specific

DistributionAlgorithm

Designer

B&B parameters

๐ด 1 , ๐’ƒ 1 , ๐’„ 1 , โ€ฆ , ๐ด ๐‘š , ๐’ƒ ๐‘š , ๐’„ ๐‘š

11

Page 12: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize โˆ™๐’™ subject to ๐ด๐’™= ๐’™โˆˆ{0,1} . NP-hard.

Model

Model has been studied from a theoretical perspective

[Gupta and Roughgarden โ€˜16, Balcan et al., โ€˜17]

Application-

Specific

DistributionAlgorithm

Designer

B&B parameters

๐ด 1 , ๐’ƒ 1 , ๐’„ 1 , โ€ฆ , ๐ด ๐‘š , ๐’ƒ ๐‘š , ๐’„ ๐‘š

12

Page 13: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize โˆ™๐’™ subject to ๐ด๐’™= ๐’™โˆˆ{0,1} . NP-hard.

Model

1. Fix a set of B&B parameters to optimize

2. Receive sample problems from unknown distribution

3. Find parameters with the best performance on the samples

โ€œBestโ€ could mean smallest search tree, for example

๐ด 1 , ๐’ƒ 1 , ๐’„ 1 ๐ด 2 , ๐’ƒ 2 , ๐’„ 2

13

Page 14: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize โˆ™๐’™ subject to ๐ด๐’™= ๐’™โˆˆ{0,1} . NP-hard.

Questions to address

How to find parameters that are best on average over samples?

Will those parameters have high performance in expectation?

๐ด, ๐’ƒ, ๐’„

?

๐ด 1 , ๐’ƒ 1 , ๐’„ 1 ๐ด 2 , ๐’ƒ 2 , ๐’„ 2

14

Page 15: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize โˆ™๐’™ subject to ๐ด๐’™= ๐’™โˆˆ{0,1} . NP-hard.

Outline

1. Introduction

2. Branch-and-Bound

3. Learning algorithms

4. Experiments

5. Conclusion and Future Directions

15

Page 16: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize โˆ™๐’™ subject to ๐ด๐’™= ๐’™โˆˆ{0,1} . NP-hard.

max (40, 60, 10, 10, 3, 20, 60) โˆ™ ๐’™

s.t. 40, 50, 30, 10, 10, 40, 30 โˆ™ ๐’™ โ‰ค 100

๐’™ โˆˆ {0,1}7

16

Page 17: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize โˆ™๐’™ subject to ๐ด๐’™= ๐’™โˆˆ{0,1} . NP-hard.

max (40, 60, 10, 10, 3, 20, 60) โˆ™ ๐’™

s.t. 40, 50, 30, 10, 10, 40, 30 โˆ™ ๐’™ โ‰ค 100

๐’™ โˆˆ {0,1}7

1

2, 1, 0, 0, 0, 0, 1

140

17

Page 18: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize โˆ™๐’™ subject to ๐ด๐’™= ๐’™โˆˆ{0,1} . NP-hard.

B&B

1. Choose leaf of tree

max (40, 60, 10, 10, 3, 20, 60) โˆ™ ๐’™

s.t. 40, 50, 30, 10, 10, 40, 30 โˆ™ ๐’™ โ‰ค 100

๐’™ โˆˆ {0,1}7

1

2, 1, 0, 0, 0, 0, 1

140

18

Page 19: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize โˆ™๐’™ subject to ๐ด๐’™= ๐’™โˆˆ{0,1} . NP-hard.

B&B

1. Choose leaf of tree

2. Branch on a variable

max (40, 60, 10, 10, 3, 20, 60) โˆ™ ๐’™

s.t. 40, 50, 30, 10, 10, 40, 30 โˆ™ ๐’™ โ‰ค 100

๐’™ โˆˆ {0,1}7

1

2, 1, 0, 0, 0, 0, 1

140

1,3

5, 0, 0, 0, 0, 1

136

0, 1, 0, 1, 0,1

4, 1

135

๐‘ฅ1 = 0 ๐‘ฅ1 = 1

19

Page 20: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize โˆ™๐’™ subject to ๐ด๐’™= ๐’™โˆˆ{0,1} . NP-hard.

B&B

1. Choose leaf of tree

2. Branch on a variable

max (40, 60, 10, 10, 3, 20, 60) โˆ™ ๐’™

s.t. 40, 50, 30, 10, 10, 40, 30 โˆ™ ๐’™ โ‰ค 100

๐’™ โˆˆ {0,1}7

1

2, 1, 0, 0, 0, 0, 1

140

1,3

5, 0, 0, 0, 0, 1

136

0, 1, 0, 1, 0,1

4, 1

135

๐‘ฅ1 = 0 ๐‘ฅ1 = 1

20

Page 21: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize โˆ™๐’™ subject to ๐ด๐’™= ๐’™โˆˆ{0,1} . NP-hard.

B&B

1. Choose leaf of tree

2. Branch on a variable

max (40, 60, 10, 10, 3, 20, 60) โˆ™ ๐’™

s.t. 40, 50, 30, 10, 10, 40, 30 โˆ™ ๐’™ โ‰ค 100

๐’™ โˆˆ {0,1}7

1

2, 1, 0, 0, 0, 0, 1

140

1,3

5, 0, 0, 0, 0, 1

136

0, 1, 0, 1, 0,1

4, 1

135

๐‘ฅ1 = 0 ๐‘ฅ1 = 1

1, 0, 0, 1, 0,1

2, 1

120

1, 1, 0, 0, 0, 0,1

3

120

๐‘ฅ2 = 0 ๐‘ฅ2 = 1

21

Page 22: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize โˆ™๐’™ subject to ๐ด๐’™= ๐’™โˆˆ{0,1} . NP-hard.

B&B

1. Choose leaf of tree

2. Branch on a variable

max (40, 60, 10, 10, 3, 20, 60) โˆ™ ๐’™

s.t. 40, 50, 30, 10, 10, 40, 30 โˆ™ ๐’™ โ‰ค 100

๐’™ โˆˆ {0,1}7

1

2, 1, 0, 0, 0, 0, 1

140

1,3

5, 0, 0, 0, 0, 1

136

0, 1, 0, 1, 0,1

4, 1

135

1, 0, 0, 1, 0,1

2, 1

120

1, 1, 0, 0, 0, 0,1

3

120

๐‘ฅ1 = 0 ๐‘ฅ1 = 1

๐‘ฅ2 = 0 ๐‘ฅ2 = 1

22

Page 23: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize โˆ™๐’™ subject to ๐ด๐’™= ๐’™โˆˆ{0,1} . NP-hard.

B&B

1. Choose leaf of tree

2. Branch on a variable

max (40, 60, 10, 10, 3, 20, 60) โˆ™ ๐’™

s.t. 40, 50, 30, 10, 10, 40, 30 โˆ™ ๐’™ โ‰ค 100

๐’™ โˆˆ {0,1}7

1

2, 1, 0, 0, 0, 0, 1

140

1,3

5, 0, 0, 0, 0, 1

136

0, 1, 0, 1, 0,1

4, 1

135

1, 0, 0, 1, 0,1

2, 1

120

1, 1, 0, 0, 0, 0,1

3

120

0,3

5, 0, 0, 0, 1, 1

116

0, 1,1

3, 1, 0, 0, 1

133.3

๐‘ฅ1 = 0 ๐‘ฅ1 = 1

๐‘ฅ6 = 0 ๐‘ฅ6 = 1 ๐‘ฅ2 = 0 ๐‘ฅ2 = 1

23

Page 24: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize โˆ™๐’™ subject to ๐ด๐’™= ๐’™โˆˆ{0,1} . NP-hard.

B&B

1. Choose leaf of tree

2. Branch on a variable

max (40, 60, 10, 10, 3, 20, 60) โˆ™ ๐’™

s.t. 40, 50, 30, 10, 10, 40, 30 โˆ™ ๐’™ โ‰ค 100

๐’™ โˆˆ {0,1}7

1

2, 1, 0, 0, 0, 0, 1

140

1,3

5, 0, 0, 0, 0, 1

136

0, 1, 0, 1, 0,1

4, 1

135

1, 0, 0, 1, 0,1

2, 1

120

1, 1, 0, 0, 0, 0,1

3

120

0, 1,1

3, 1, 0, 0, 1

133.3

๐‘ฅ1 = 0 ๐‘ฅ1 = 1

๐‘ฅ6 = 0 ๐‘ฅ6 = 1 ๐‘ฅ2 = 0 ๐‘ฅ2 = 1

0,3

5, 0, 0, 0, 1, 1

116

24

Page 25: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize โˆ™๐’™ subject to ๐ด๐’™= ๐’™โˆˆ{0,1} . NP-hard.

B&B

1. Choose leaf of tree

2. Branch on a variable

max (40, 60, 10, 10, 3, 20, 60) โˆ™ ๐’™

s.t. 40, 50, 30, 10, 10, 40, 30 โˆ™ ๐’™ โ‰ค 100

๐’™ โˆˆ {0,1}7

1

2, 1, 0, 0, 0, 0, 1

140

1,3

5, 0, 0, 0, 0, 1

136

0, 1, 0, 1, 0,1

4, 1

135

1, 0, 0, 1, 0,1

2, 1

120

1, 1, 0, 0, 0, 0,1

3

120

0,3

5, 0, 0, 0, 1, 1

116

0, 1,1

3, 1, 0, 0, 1

133.3

0, 1, 0, 1, 1, 0, 1 0,4

5, 1, 0, 0, 0, 1

118

๐‘ฅ1 = 0 ๐‘ฅ1 = 1

๐‘ฅ6 = 0 ๐‘ฅ6 = 1 ๐‘ฅ2 = 0 ๐‘ฅ2 = 1

๐‘ฅ3 = 0 ๐‘ฅ3 = 1

133

25

Page 26: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize โˆ™๐’™ subject to ๐ด๐’™= ๐’™โˆˆ{0,1} . NP-hard.

B&B

1. Choose leaf of tree

2. Branch on a variable

3. Fathom leaf if:i. LP relaxation solution is

integral

ii. LP relaxation is infeasible

iii. LP relaxation solution isnโ€™t better than best-known integral solution

max (40, 60, 10, 10, 3, 20, 60) โˆ™ ๐’™

s.t. 40, 50, 30, 10, 10, 40, 30 โˆ™ ๐’™ โ‰ค 100

๐’™ โˆˆ {0,1}7

1

2, 1, 0, 0, 0, 0, 1

140

1,3

5, 0, 0, 0, 0, 1

136

0, 1, 0, 1, 0,1

4, 1

135

1, 0, 0, 1, 0,1

2, 1

120

1, 1, 0, 0, 0, 0,1

3

120

0,3

5, 0, 0, 0, 1, 1

116

0, 1,1

3, 1, 0, 0, 1

133.3

๐‘ฅ1 = 0 ๐‘ฅ1 = 1

๐‘ฅ6 = 0 ๐‘ฅ6 = 1 ๐‘ฅ2 = 0 ๐‘ฅ2 = 1

๐‘ฅ3 = 0 ๐‘ฅ3 = 1

0,4

5, 1, 0, 0, 0, 1

118

0, 1, 0, 1, 1, 0, 1

133

26

Page 27: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize โˆ™๐’™ subject to ๐ด๐’™= ๐’™โˆˆ{0,1} . NP-hard.

B&B

1. Choose leaf of tree

2. Branch on a variable

3. Fathom leaf if:i. LP relaxation solution

is integral

ii. LP relaxation is infeasible

iii. LP relaxation solution isnโ€™t better than best-known integral solution

max (40, 60, 10, 10, 3, 20, 60) โˆ™ ๐’™

s.t. 40, 50, 30, 10, 10, 40, 30 โˆ™ ๐’™ โ‰ค 100

๐’™ โˆˆ {0,1}7

1

2, 1, 0, 0, 0, 0, 1

140

1,3

5, 0, 0, 0, 0, 1

136

0, 1, 0, 1, 0,1

4, 1

135

1, 0, 0, 1, 0,1

2, 1

120

1, 1, 0, 0, 0, 0,1

3

120

0,3

5, 0, 0, 0, 1, 1

116

0, 1,1

3, 1, 0, 0, 1

133.3

๐‘ฅ1 = 0 ๐‘ฅ1 = 1

๐‘ฅ6 = 0 ๐‘ฅ6 = 1 ๐‘ฅ2 = 0 ๐‘ฅ2 = 1

๐‘ฅ3 = 0 ๐‘ฅ3 = 1

0,4

5, 1, 0, 0, 0, 1

118

0, 1, 0, 1, 1, 0, 1

133Integral

27

Page 28: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize โˆ™๐’™ subject to ๐ด๐’™= ๐’™โˆˆ{0,1} . NP-hard.

B&B

1. Choose leaf of tree

2. Branch on a variable

3. Fathom leaf if:i. LP relaxation solution is

integral

ii. LP relaxation is infeasible

iii. LP relaxation solution isnโ€™t better than best-known integral solution

max (40, 60, 10, 10, 3, 20, 60) โˆ™ ๐’™

s.t. 40, 50, 30, 10, 10, 40, 30 โˆ™ ๐’™ โ‰ค 100

๐’™ โˆˆ {0,1}7

1

2, 1, 0, 0, 0, 0, 1

140

1,3

5, 0, 0, 0, 0, 1

136

0, 1, 0, 1, 0,1

4, 1

135

1, 0, 0, 1, 0,1

2, 1

120

1, 1, 0, 0, 0, 0,1

3

120

0,3

5, 0, 0, 0, 1, 1

116

0, 1,1

3, 1, 0, 0, 1

133.3

๐‘ฅ1 = 0 ๐‘ฅ1 = 1

๐‘ฅ6 = 0 ๐‘ฅ6 = 1 ๐‘ฅ2 = 0 ๐‘ฅ2 = 1

๐‘ฅ3 = 0 ๐‘ฅ3 = 1

0,4

5, 1, 0, 0, 0, 1

118

0, 1, 0, 1, 1, 0, 1

133

28

Page 29: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize โˆ™๐’™ subject to ๐ด๐’™= ๐’™โˆˆ{0,1} . NP-hard.

B&B

1. Choose leaf of tree

2. Branch on a variable

3. Fathom leaf if:i. LP relaxation solution is

integral

ii. LP relaxation is infeasible

iii. LP relaxation solution isnโ€™t better than best-known integral solution

This talk: How to choose which variable?(Assume every other aspect of B&B is fixed.)

29

Page 30: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize โˆ™๐’™ subject to ๐ด๐’™= ๐’™โˆˆ{0,1} . NP-hard.

Variable selection policies can have a huge effect on tree size

30

Page 31: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize โˆ™๐’™ subject to ๐ด๐’™= ๐’™โˆˆ{0,1} . NP-hard.

Outline

1. Introduction

2. Branch-and-Bounda. Algorithm Overview

b. Variable Selection Policies

3. Learning algorithms

4. Experiments

5. Conclusion and Future Directions

31

Page 32: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize โˆ™๐’™ subject to ๐ด๐’™= ๐’™โˆˆ{0,1} . NP-hard.

Variable selection policies (VSPs)

Score-based VSP:

At leaf ๐‘ธ, branch on variable ๐’™๐’Š maximizing๐ฌ๐œ๐จ๐ซ๐ž ๐‘ธ, ๐’Š

Many options! Little known about which to use when

1,3

5, 0, 0, 0, 0, 1

136

1, 0, 0, 1, 0,1

2, 1

120

1, 1, 0, 0, 0, 0,1

3

120

๐‘ฅ2 = 0 ๐‘ฅ2 = 1

32

Page 33: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize โˆ™๐’™ subject to ๐ด๐’™= ๐’™โˆˆ{0,1} . NP-hard.

Variable selection policies

For an IP instance ๐‘„:

โ€ข Let ๐‘๐‘„ be the objective value of its LP relaxation

โ€ข Let ๐‘„๐‘–โˆ’ be ๐‘„ with ๐‘ฅ๐‘– set to 0, and let ๐‘„๐‘–

+ be ๐‘„ with ๐‘ฅ๐‘– set to 1

Example.

1

2, 1, 0, 0, 0, 0, 1

140

max (40, 60, 10, 10, 3, 20, 60) โˆ™ ๐’™s.t. 40, 50, 30, 10, 10, 40, 30 โˆ™ ๐’™ โ‰ค 100

๐’™ โˆˆ {0,1}7 ๐‘๐‘„๐‘„

33

Page 34: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize โˆ™๐’™ subject to ๐ด๐’™= ๐’™โˆˆ{0,1} . NP-hard.

Variable selection policies

For an IP instance ๐‘„:

โ€ข Let ๐‘๐‘„ be the objective value of its LP relaxation

โ€ข Let ๐‘„๐‘–โˆ’ be ๐‘„ with ๐‘ฅ๐‘– set to 0, and let ๐‘„๐‘–

+ be ๐‘„ with ๐‘ฅ๐‘– set to 1

Example.

1

2, 1, 0, 0, 0, 0, 1

140

1,3

5, 0, 0, 0, 0, 1

136

0, 1, 0, 1, 0,1

4, 1

135

๐‘ฅ1 = 0 ๐‘ฅ1 = 1

max (40, 60, 10, 10, 3, 20, 60) โˆ™ ๐’™s.t. 40, 50, 30, 10, 10, 40, 30 โˆ™ ๐’™ โ‰ค 100

๐’™ โˆˆ {0,1}7

๐‘๐‘„1โˆ’ ๐‘๐‘„1+

๐‘๐‘„๐‘„

34

Page 35: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize โˆ™๐’™ subject to ๐ด๐’™= ๐’™โˆˆ{0,1} . NP-hard.

For a IP instance ๐‘„:

โ€ข Let ๐‘๐‘„ be the objective value of its LP relaxation

โ€ข Let ๐‘„๐‘–โˆ’ be ๐‘„ with ๐‘ฅ๐‘– set to 0, and let ๐‘„๐‘–

+ be ๐‘„ with ๐‘ฅ๐‘– set to 1

Example.

Variable selection policies

1

2, 1, 0, 0, 0, 0, 1

140

1,3

5, 0, 0, 0, 0, 1

136

0, 1, 0, 1, 0,1

4, 1

135

๐‘ฅ1 = 0 ๐‘ฅ1 = 1

๐‘๐‘„1โˆ’ ๐‘๐‘„1+

๐‘๐‘„๐‘„

max (40, 60, 10, 10, 3, 20, 60) โˆ™ ๐’™s.t. 40, 50, 30, 10, 10, 40, 30 โˆ™ ๐’™ โ‰ค 100

๐’™ โˆˆ {0,1}7

The linear rule (parameterized by ๐) [Linderoth & Savelsbergh, 1999]

Branch on variable ๐‘ฅ๐‘– maximizing:

score ๐‘„, ๐‘– = ๐œ‡min ๐‘๐‘„ โˆ’ ๐‘๐‘„๐‘–โˆ’ , ๐‘๐‘„ โˆ’ ๐‘๐‘„๐‘–

+ + (1 โˆ’ ๐œ‡)max ๐‘๐‘„ โˆ’ ๐‘๐‘„๐‘–โˆ’ , ๐‘๐‘„ โˆ’ ๐‘๐‘„๐‘–

+

35

Page 36: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize โˆ™๐’™ subject to ๐ด๐’™= ๐’™โˆˆ{0,1} . NP-hard.

Variable selection policies

And many moreโ€ฆ

The (simplified) product rule [Achterberg, 2009]

Branch on variable ๐‘ฅ๐‘– maximizing:

score ๐‘„, ๐‘– = ๐‘๐‘„ โˆ’ ๐‘๐‘„๐‘–โˆ’ โˆ™ ๐‘๐‘„ โˆ’ ๐‘๐‘„๐‘–

+

The linear rule (parameterized by ๐) [Linderoth & Savelsbergh, 1999]

Branch on variable ๐‘ฅ๐‘– maximizing:

score ๐‘„, ๐‘– = ๐œ‡min ๐‘๐‘„ โˆ’ ๐‘๐‘„๐‘–โˆ’ , ๐‘๐‘„ โˆ’ ๐‘๐‘„๐‘–

+ + (1 โˆ’ ๐œ‡)max ๐‘๐‘„ โˆ’ ๐‘๐‘„๐‘–โˆ’ , ๐‘๐‘„ โˆ’ ๐‘๐‘„๐‘–

+

36

Page 37: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize โˆ™๐’™ subject to ๐ด๐’™= ๐’™โˆˆ{0,1} . NP-hard.

Variable selection policies

Given ๐‘‘ scoring rules score1, โ€ฆ , scored.

Goal: Learn best convex combination ๐œ‡1score1 +โ‹ฏ+ ๐œ‡๐‘‘scored.

Branch on variable ๐‘ฅ๐‘– maximizing:

score ๐‘„, ๐‘– = ๐œ‡1score1 ๐‘„, ๐‘– + โ‹ฏ+ ๐œ‡๐‘‘scored ๐‘„, ๐‘–

Our parameterized rule

37

Page 38: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize โˆ™๐’™ subject to ๐ด๐’™= ๐’™โˆˆ{0,1} . NP-hard.

Application-

Specific

DistributionAlgorithm

Designer

B&B parameters

Model

๐ด 1 , ๐’ƒ 1 , ๐’„ 1 , โ€ฆ , ๐ด ๐‘š , ๐’ƒ ๐‘š , ๐’„ ๐‘š

How to use samples to find best B&B parameters for my domain?

38

Page 39: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize โˆ™๐’™ subject to ๐ด๐’™= ๐’™โˆˆ{0,1} . NP-hard.

Application-

Specific

DistributionAlgorithm

Designer

B&B parameters

Model

๐ด 1 , ๐’ƒ 1 , ๐’„ 1 , โ€ฆ , ๐ด ๐‘š , ๐’ƒ ๐‘š , ๐’„ ๐‘š

๐œ‡1, โ€ฆ , ๐œ‡๐‘‘

How to use samples to find best B&B parameters for my domain?๐œ‡1, โ€ฆ , ๐œ‡๐‘‘

39

Page 40: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize โˆ™๐’™ subject to ๐ด๐’™= ๐’™โˆˆ{0,1} . NP-hard.

Outline

1. Introduction

2. Branch-and-Bound

3. Learning algorithmsa. First-try: Discretization

b. Our Approach

4. Experiments

5. Conclusion and Future Directions

40

Page 41: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize โˆ™๐’™ subject to ๐ด๐’™= ๐’™โˆˆ{0,1} . NP-hard.

First try: Discretization

1. Discretize parameter space

2. Receive sample problems from unknown distribution

3. Find params in discretization with best average performance

๐œ‡

Average tree size

41

Page 42: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize โˆ™๐’™ subject to ๐ด๐’™= ๐’™โˆˆ{0,1} . NP-hard.

First try: Discretization

This has been prior workโ€™s approach [e.g., Achterberg (2009)].

๐œ‡

Average tree size

42

Page 43: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize โˆ™๐’™ subject to ๐ด๐’™= ๐’™โˆˆ{0,1} . NP-hard.

Discretization gone wrong

๐œ‡

Average tree size

43

Page 44: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize โˆ™๐’™ subject to ๐ด๐’™= ๐’™โˆˆ{0,1} . NP-hard.

Discretization gone wrong

๐œ‡

Average tree size

This can

actually

happen!

44

Page 45: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize โˆ™๐’™ subject to ๐ด๐’™= ๐’™โˆˆ{0,1} . NP-hard.

Discretization gone wrong

Theorem [informal]. For any discretization:

Exists problem instance distribution ๐’Ÿ inducing this behavior

Proof ideas:

๐’Ÿโ€™s support consists of infeasible IPs with โ€œeasy outโ€ variablesB&B takes exponential time unless branches on โ€œeasy outโ€ variables

B&B only finds โ€œeasy outsโ€ if uses parameters from specific range

Expected tree size

๐œ‡

45

Page 46: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize โˆ™๐’™ subject to ๐ด๐’™= ๐’™โˆˆ{0,1} . NP-hard.

Outline

1. Introduction

2. Branch-and-Bound

3. Learning algorithmsa. First-try: Discretization

b. Our Approachi. Single-parameter settings

ii. Multi-parameter settings

4. Experiments

5. Conclusion and Future Directions

46

Page 47: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize โˆ™๐’™ subject to ๐ด๐’™= ๐’™โˆˆ{0,1} . NP-hard.

Simple assumption

Exists ๐œ… upper bounding the size of largest tree willing to build

Common assumption, e.g.:

โ€ข Hutter, Hoos, Leyton-Brown, Stรผtzle, JAIRโ€™09

โ€ข Kleinberg, Leyton-Brown, Lucier, IJCAIโ€™17

47

Page 48: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize โˆ™๐’™ subject to ๐ด๐’™= ๐’™โˆˆ{0,1} . NP-hard.

๐œ‡ โˆˆ [0,1]

Lemma: For any two scoring rules and any IP ๐‘„,

๐‘‚ (# variables)๐œ…+2 intervals partition [0,1] such that:

For any interval [๐‘Ž, ๐‘], B&B builds same tree across all ๐œ‡ โˆˆ ๐‘Ž, ๐‘

Much smaller in our experiments!

Useful lemma

48

Page 49: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize โˆ™๐’™ subject to ๐ด๐’™= ๐’™โˆˆ{0,1} . NP-hard.

Useful lemma

branch

on ๐‘ฅ2

branch

on ๐‘ฅ3

๐œ‡

๐œ‡ โˆ™ score1 ๐‘„, 1 + (1 โˆ’ ๐œ‡) โˆ™ score2 ๐‘„, 1

๐œ‡ โˆ™ score1 ๐‘„, 2 + (1 โˆ’ ๐œ‡) โˆ™ score2 ๐‘„, 2

๐œ‡ โˆ™ score1 ๐‘„, 3 + (1 โˆ’ ๐œ‡) โˆ™ score2 ๐‘„, 3

branch

on ๐‘ฅ1

Lemma: For any two scoring rules and any IP ๐‘„,

๐‘‚ (# variables)๐œ…+2 intervals partition [0,1] such that:

For any interval [๐‘Ž, ๐‘], B&B builds same tree across all ๐œ‡ โˆˆ ๐‘Ž, ๐‘

49

Page 50: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize โˆ™๐’™ subject to ๐ด๐’™= ๐’™โˆˆ{0,1} . NP-hard.

Useful lemma

๐‘„

๐‘„2โˆ’ ๐‘„2

+

Any ๐œ‡ in yellow interval:

๐‘ฅ2 = 0 ๐‘ฅ2 = 1

branch

on ๐‘ฅ2

branch

on ๐‘ฅ3

๐œ‡

branch

on ๐‘ฅ1

Lemma: For any two scoring rules and any IP ๐‘„,

๐‘‚ (# variables)๐œ…+2 intervals partition [0,1] such that:

For any interval [๐‘Ž, ๐‘], B&B builds same tree across all ๐œ‡ โˆˆ ๐‘Ž, ๐‘

๐‘„2โˆ’

50

Page 51: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize โˆ™๐’™ subject to ๐ด๐’™= ๐’™โˆˆ{0,1} . NP-hard.

Useful lemma

Lemma: For any two scoring rules and any IP ๐‘„,

๐‘‚ (# variables)๐œ…+2 intervals partition [0,1] such that:

For any interval [๐‘Ž, ๐‘], B&B builds same tree across all ๐œ‡ โˆˆ ๐‘Ž, ๐‘

๐œ‡

๐œ‡ โˆ™ score1 ๐‘„2โˆ’, 1 + (1 โˆ’ ๐œ‡) โˆ™ score2 ๐‘„2

โˆ’, 1

๐œ‡ โˆ™ score1 ๐‘„2โˆ’, 3 + (1 โˆ’ ๐œ‡) โˆ™ score2 ๐‘„2

โˆ’, 3

๐‘„

๐‘„2โˆ’ ๐‘„2

+

Any ๐œ‡ in yellow interval:

๐‘ฅ2 = 0 ๐‘ฅ2 = 1

branch on

๐‘ฅ2 then ๐‘ฅ3

branch on

๐‘ฅ2 then ๐‘ฅ1

51

Page 52: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize โˆ™๐’™ subject to ๐ด๐’™= ๐’™โˆˆ{0,1} . NP-hard.

Useful lemma

Lemma: For any two scoring rules and any IP ๐‘„,

๐‘‚ (# variables)๐œ…+2 intervals partition [0,1] such that:

For any interval [๐‘Ž, ๐‘], B&B builds same tree across all ๐œ‡ โˆˆ ๐‘Ž, ๐‘

๐œ‡

Any ๐œ‡ in blue-yellow interval:

branch on

๐‘ฅ2 then ๐‘ฅ3

branch on

๐‘ฅ2 then ๐‘ฅ1

๐‘„

๐‘„2โˆ’ ๐‘„2

+

๐‘ฅ3 = 0 ๐‘ฅ3 = 1

52

๐‘ฅ2 = 0 ๐‘ฅ2 = 1

Page 53: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize โˆ™๐’™ subject to ๐ด๐’™= ๐’™โˆˆ{0,1} . NP-hard.

Useful lemma

Proof idea.

โ€ข Continue dividing [0,1] into intervals s.t.:

In each interval, var. selection order fixed

โ€ข Can subdivide only finite number of times

โ€ข Proof follows by induction on tree depth

Lemma: For any two scoring rules and any IP ๐‘„,

๐‘‚ (# variables)๐œ…+2 intervals partition [0,1] such that:

For any interval [๐‘Ž, ๐‘], B&B builds same tree across all ๐œ‡ โˆˆ ๐‘Ž, ๐‘

๐œ‡

branch on

๐‘ฅ2 then ๐‘ฅ3

branch on

๐‘ฅ2 then ๐‘ฅ1

53

Page 54: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize โˆ™๐’™ subject to ๐ด๐’™= ๐’™โˆˆ{0,1} . NP-hard.

Learning algorithm

Input: Set of IPs sampled from a distribution ๐’Ÿ

For each IP, set ๐œ‡ = 0. While ๐œ‡ < 1:1. Run B&B using ๐œ‡ โˆ™ score1 + (1 โˆ’ ๐œ‡) โˆ™ score2, resulting in tree ๐’ฏ

2. Find interval ๐œ‡, ๐œ‡โ€ฒ where if B&B is run using the scoring rule ๐œ‡โ€ฒโ€ฒ โˆ™ score1 + 1 โˆ’ ๐œ‡โ€ฒโ€ฒ โˆ™ score2

for any ๐œ‡โ€ฒโ€ฒ โˆˆ ๐œ‡, ๐œ‡โ€ฒ , B&B will build tree ๐’ฏ (takes a little bookkeeping)

3. Set ๐œ‡ = ๐œ‡โ€ฒ

Return: Any เทœ๐œ‡ from the interval minimizing average tree size

๐œ‡ โˆˆ [0,1]

54

Page 55: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize โˆ™๐’™ subject to ๐ด๐’™= ๐’™โˆˆ{0,1} . NP-hard.

Learning algorithm guarantees

Let ฦธ๐œ‡ be algorithmโ€™s output given เทจ๐‘‚๐œ…3

๐œ€2ln(#variables) samples.

W.h.p., ๐”ผ๐‘„~๐’Ÿ[tree-size(๐‘„, ฦธ๐œ‡)] โˆ’ min๐œ‡โˆˆ 0,1

๐”ผ๐‘„~๐’Ÿ[treeโˆ’size(๐‘„, ๐œ‡)] < ๐œ€

Proof intuition: Bound algorithm classโ€™s intrinsic complexity (IC)โ€ข Lemma bounds the number of โ€œtruly differentโ€ parameters

โ€ข Parameters that are โ€œthe sameโ€ come from a simple set

Learning theory allows us to translate IC to sample complexity

๐œ‡ โˆˆ [0,1]

55

Page 56: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize โˆ™๐’™ subject to ๐ด๐’™= ๐’™โˆˆ{0,1} . NP-hard.

Outline

1. Introduction

2. Branch-and-Bound

3. Learning algorithmsa. First-try: Discretization

b. Our Approachi. Single-parameter settings

ii. Multi-parameter settings

4. Experiments

5. Conclusion and Future Directions

56

Page 57: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize โˆ™๐’™ subject to ๐ด๐’™= ๐’™โˆˆ{0,1} . NP-hard.

Useful lemma: higher dimensions

Lemma: For any ๐‘‘ scoring rules and any IP,

a set โ„‹ of ๐‘‚ (# variables)๐œ…+2 hyperplanes partitions 0,1 ๐‘‘ s.t.:

For any connected component ๐‘… of 0,1 ๐‘‘ โˆ–โ„‹,

B&B builds the same tree across all ๐ โˆˆ ๐‘…

57

Page 58: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize โˆ™๐’™ subject to ๐ด๐’™= ๐’™โˆˆ{0,1} . NP-hard.

Learning-theoretic guarantees

Fix ๐‘‘ scoring rules and draw samples ๐‘„1, โ€ฆ , ๐‘„๐‘~๐’Ÿ

If ๐‘ = เทจ๐‘‚๐œ…3

๐œ€2ln(๐‘‘ โˆ™ #variables) , then w.h.p., for all ๐ โˆˆ [0,1]๐‘‘,

1

๐‘

๐‘–=1

๐‘

treeโˆ’size(๐‘„๐‘– , ๐) โˆ’ ๐”ผ๐‘„~๐’Ÿ[treeโˆ’size(๐‘„, ๐)] < ๐œ€

Average tree size generalizes to expected tree size

58

Page 59: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize โˆ™๐’™ subject to ๐ด๐’™= ๐’™โˆˆ{0,1} . NP-hard.

Outline

1. Introduction

2. Branch-and-Bound

3. Learning algorithms

4. Experiments

5. Conclusion and Future Directions

59

Page 60: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize โˆ™๐’™ subject to ๐ด๐’™= ๐’™โˆˆ{0,1} . NP-hard.

Experiments: Tuning the linear rule

Let: score1 ๐‘„, ๐‘– = min ๐‘๐‘„ โˆ’ ๐‘๐‘„๐‘–โˆ’ , ๐‘๐‘„ โˆ’ ๐‘๐‘„๐‘–

+

score2 ๐‘„, ๐‘– = max ๐‘๐‘„ โˆ’ ๐‘๐‘„๐‘–โˆ’ , ๐‘๐‘„ โˆ’ ๐‘๐‘„๐‘–

+

Our parameterized rule

Branch on variable ๐‘ฅ๐‘– maximizing:

score ๐‘„, ๐‘– = ๐œ‡ โˆ™ score1 ๐‘„, ๐‘– + (1 โˆ’ ๐œ‡) โˆ™ score2 ๐‘„, ๐‘–

This is the linear rule [Linderoth & Savelsbergh, 1999]

60

Page 61: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize โˆ™๐’™ subject to ๐ด๐’™= ๐’™โˆˆ{0,1} . NP-hard.

Experiments: Combinatorial auctions

Leyton-Brown, Pearson, and Shoham. Towards a universal test suite for combinatorial auction algorithms. In Proceedings of the Conference on Electronic Commerce (EC), 2000.

โ€œRegionsโ€ generator:

400 bids, 200 goods, 100 instances

โ€œArbitraryโ€ generator:

200 bids, 100 goods, 100 instances

61

Page 62: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize โˆ™๐’™ subject to ๐ด๐’™= ๐’™โˆˆ{0,1} . NP-hard.

Additional experiments

Facility location:

70 facilities, 70 customers,

500 instances

Clustering:

5 clusters, 35 nodes,

500 instances

Agnostically learning

linear separators:

50 points in โ„2,

500 instances

62

Page 63: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize โˆ™๐’™ subject to ๐ด๐’™= ๐’™โˆˆ{0,1} . NP-hard.

Outline

1. Introduction

2. Branch-and-Bound

3. Learning algorithms

4. Experiments

5. Conclusion and Future Directions

63

Page 64: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize โˆ™๐’™ subject to ๐ด๐’™= ๐’™โˆˆ{0,1} . NP-hard.

Conclusion

โ€ข Study B&B, a widely-used algorithm for combinatorial problems

โ€ข Show how to use ML to weight variable selection rulesโ€ข First sample complexity bounds for tree search algorithm configuration

โ€ข Unlike prior work [Khalil et al. โ€˜16; Alvarez et al. โ€˜17], which is purely empirical

โ€ข Empirically show our approach can dramatically shrink tree sizeโ€ข We prove this improvement can even be exponential

โ€ข Theory applies to other tree search algos, e.g., for solving CSPs

64

Page 65: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize โˆ™๐’™ subject to ๐ด๐’™= ๐’™โˆˆ{0,1} . NP-hard.

Future directions

How can we train faster?โ€ข Donโ€™t want to build every tree B&B will make for every training instance โ€ข Train on small IPs and then apply the learned policies on large IPs?

Other tree-building applications can we apply our techniques to?โ€ข E.g., building decision trees and taxonomies

How can we attack other learning problems in B&B? โ€ข E.g., node-selection policies

65

Thank you! Questions?