Constraint Programming and Monte-Carlo Tree-Search: Application to the Job Shop Problem

1. Catania, 11 January 2013Constraint Programming and Monte-Carlo Tree Search: Application to the Job Shop problemManuel Loth, Michle Sebag, Youssef Hamadi, Marc Schoenauer,Christian Schulte

2. Constraint Programming Monte-Carlo Tree Search AdaptationExperiments1 Constraint Programming2 Monte-Carlo Tree Search3 Adaptation4 Experiments Manuel Loth Constraint Programming and Monte-Carlo Tree Search 11 January 20132 / 30 3. Constraint Programming Monte-Carlo Tree Search AdaptationExperimentsConstraint ProgrammingProgramming framework for addressing decision andoptimization problems,specied from set of generic constraints,for which efcient propagators permitearly detection of unfeasibility on partial assignments(branch and cut).Optimization handled by means of a moving constraint: objective-variable < {its value in last solution} Manuel Loth Constraint Programming and Monte-Carlo Tree Search 11 January 20133 / 30 4. Constraint ProgrammingMonte-Carlo Tree Search AdaptationExperimentsConstraint ProgrammingGenerally not state-of-the-art, butsimple and versatile (many problems, possibility ofuser-specic constraints),extensible components (ad-hoc procedures),complete. Manuel LothConstraint Programming and Monte-Carlo Tree Search 11 January 20134 / 30 5. Constraint Programming Monte-Carlo Tree Search AdaptationExperimentsComponentsmodelling from constraints (user task);branching: variable selection/ordering: generic or ad-hocschemes;search: value selection/ordering, back-tracking;propagation: restrict variables domains after anassignment (under the hood). Manuel Loth Constraint Programming and Monte-Carlo Tree Search 11 January 20135 / 30 6. Constraint Programming Monte-Carlo Tree Search AdaptationExperimentsSearchxi1 =? Manuel Loth Constraint Programming and Monte-Carlo Tree Search 11 January 20136 / 30 7. Constraint Programming Monte-Carlo Tree Search AdaptationExperimentsSearchxi1 =? 01 Manuel Loth Constraint Programming and Monte-Carlo Tree Search 11 January 20136 / 30 8. Constraint ProgrammingMonte-Carlo Tree SearchAdaptationExperimentsSearchxi1 =? 01 xi2 =? 01 Manuel Loth Constraint Programming and Monte-Carlo Tree Search 11 January 20136 / 30 9. Constraint ProgrammingMonte-Carlo Tree SearchAdaptationExperimentsSearchxi1 =? 01 xi2 =? 01 failure Manuel Loth Constraint Programming and Monte-Carlo Tree Search 11 January 20136 / 30 10. Constraint ProgrammingMonte-Carlo Tree SearchAdaptationExperimentsSearchxi1 =? 01 xi2 =? 01 failurexi3 =? 0success Manuel Loth Constraint Programming and Monte-Carlo Tree Search 11 January 20136 / 30 11. Constraint Programming Monte-Carlo Tree Search AdaptationExperimentsBranchingCrucial component: addressing the good variables rst candrastically reduce the tree height:dynamic variable ordering (wdeg),frequent restarts (Luby). Manuel Loth Constraint Programming and Monte-Carlo Tree Search 11 January 20137 / 30 12. Constraint Programming Monte-Carlo Tree Search AdaptationExperimentsSolution-Guided SearchFor optimization, search around previous solution(s) can beeffective:values in last solution = left,left-most search. Manuel Loth Constraint Programming and Monte-Carlo Tree Search 11 January 20138 / 30 13. Constraint Programming Monte-Carlo Tree Search AdaptationExperiments1 Constraint Programming2 Monte-Carlo Tree Search3 Adaptation4 Experiments Manuel Loth Constraint Programming and Monte-Carlo Tree Search 11 January 20139 / 30 14. Constraint Programming Monte-Carlo Tree Search AdaptationExperimentsMulti-armed banditsX1 X2 1 ? 2 ? Manuel Loth Constraint Programming and Monte-Carlo Tree Search 11 January 2013 10 / 30 15. Constraint Programming Monte-Carlo Tree Search AdaptationExperimentsMulti-armed bandits sample ?X1 X2 1 ? 2 ? Manuel Loth Constraint Programming and Monte-Carlo Tree Search 11 January 2013 10 / 30 16. Constraint Programming Monte-Carlo Tree Search AdaptationExperimentsMulti-armed bandits sample ?X1 X2 1 ? 2 ? Manuel Loth Constraint Programming and Monte-Carlo Tree Search 11 January 2013 10 / 30 17. Constraint Programming Monte-Carlo Tree Search AdaptationExperimentsMulti-armed bandits sample ?X1 X2 1 ? 2 ? Manuel Loth Constraint Programming and Monte-Carlo Tree Search 11 January 2013 10 / 30 18. Constraint Programming Monte-Carlo Tree Search AdaptationExperimentsMulti-armed bandits sample ?n1n2X1 X2 1 ?1 2 ?2 Manuel Loth Constraint Programming and Monte-Carlo Tree Search 11 January 2013 10 / 30 19. Constraint Programming Monte-Carlo Tree Search AdaptationExperimentsMulti-armed bandits sample ?n1n2X1 X2 1 ?1 2 ?2 min ( It ) tLearn fast but surely, balance exploration and exploitation. Manuel Loth Constraint Programming and Monte-Carlo Tree Search 11 January 2013 10 / 30 20. Constraint Programming Monte-Carlo Tree Search AdaptationExperimentsMulti-armed banditssample ?n1n2X1 X2 1 ?1 2 ?2UCB (Auer et al., 2002): log tIt = arg max i + C ini Manuel Loth Constraint Programming and Monte-Carlo Tree Search 11 January 2013 10 / 30 21. Constraint Programming Monte-Carlo Tree Search AdaptationExperimentsUCBoptimal asymptotic rate,easy computation,interpretable,single intuitive parameter,not quite optimal but very robust to altered settings andviolations of hypoteses. Manuel Loth Constraint Programming and Monte-Carlo Tree Search 11 January 2013 11 / 30 22. Constraint Programming Monte-Carlo Tree Search AdaptationExperimentsReinforcement Learning action? Manuel Loth Constraint Programming and Monte-Carlo Tree Search 11 January 2013 12 / 30 23. Constraint Programming Monte-Carlo Tree Search AdaptationExperimentsReinforcement Learning action?AB Manuel Loth Constraint Programming and Monte-Carlo Tree Search 11 January 2013 12 / 30 24. Constraint ProgrammingMonte-Carlo Tree Search AdaptationExperimentsReinforcement Learningaction?A B action? AB Manuel LothConstraint Programming and Monte-Carlo Tree Search 11 January 2013 12 / 30 25. Constraint ProgrammingMonte-Carlo Tree Search AdaptationExperimentsReinforcement Learningaction?A B action? AB r R1 Manuel LothConstraint Programming and Monte-Carlo Tree Search 11 January 2013 12 / 30 26. Constraint ProgrammingMonte-Carlo Tree Search AdaptationExperimentsReinforcement Learningaction?A B action? A B r R1 action? Ar R2 Manuel LothConstraint Programming and Monte-Carlo Tree Search 11 January 2013 12 / 30 27. Constraint ProgrammingMonte-Carlo Tree Search AdaptationExperimentsReinforcement Learningaction?A B action? A B r R1 action? Ar R2Iterate trials from the root, for learning values of nodes/actions:maximum expected sum of rewards. Manuel LothConstraint Programming and Monte-Carlo Tree Search 11 January 2013 12 / 30 28. Constraint ProgrammingMonte-Carlo Tree Search AdaptationExperimentsMCTSaction?nA AB action? A A B r R1 action? Ar R2Nested-bandits approach to RL:Learn and maximize average reward after an action, whichshould converge in cascade to the actual value. Manuel LothConstraint Programming and Monte-Carlo Tree Search 11 January 2013 13 / 30 29. Constraint Programming Monte-Carlo Tree Search AdaptationExperiments Manuel Loth Constraint Programming and Monte-Carlo Tree Search 11 January 2013 14 / 30 30. Constraint Programming Monte-Carlo Tree Search AdaptationExperiments Manuel Loth Constraint Programming and Monte-Carlo Tree Search 11 January 2013 14 / 30 31. Constraint Programming Monte-Carlo Tree Search AdaptationExperiments Manuel Loth Constraint Programming and Monte-Carlo Tree Search 11 January 2013 14 / 30 32. Constraint Programming Monte-Carlo Tree Search AdaptationExperiments Manuel Loth Constraint Programming and Monte-Carlo Tree Search 11 January 2013 14 / 30 33. Constraint Programming Monte-Carlo Tree Search AdaptationExperiments Manuel Loth Constraint Programming and Monte-Carlo Tree Search 11 January 2013 14 / 30 34. Constraint Programming Monte-Carlo Tree Search AdaptationExperiments Manuel Loth Constraint Programming and Monte-Carlo Tree Search 11 January 2013 14 / 30 35. Constraint Programming Monte-Carlo Tree Search AdaptationExperimentsAdaptive top policy,storing statistics;systematic bottom policy,no or low storage;growing domain of toppolicy,focusing on promisingregions. Manuel Loth Constraint Programming and Monte-Carlo Tree Search 11 January 2013 15 / 30 36. Constraint Programming Monte-Carlo Tree Search AdaptationExperiments1 Constraint Programming2 Monte-Carlo Tree Search3 Adaptation4 Experiments Manuel Loth Constraint Programming and Monte-Carlo Tree Search 11 January 2013 16 / 30 37. Constraint Programming Monte-Carlo Tree Search AdaptationExperimentsRewardNo direct candidate for reward function:outcomes = failure | success,nd way to success from statistics on failures.Success = never fail longest branch;Reward = (failure) depth. Manuel Loth Constraint Programming and Monte-Carlo Tree Search 11 January 2013 17 / 30 38. Constraint Programming Monte-Carlo Tree Search AdaptationExperimentsBottom policyMC not ideal, because exhaustive search is wanted, value ordering (left preference) must be followed, in this deterministic, non-adversarial setting, more bias and less variance are needed;DFS offers linear storage complexity, ordered exploration for more meaningful comparisons, the baseline on which to improve, by bringing adaptive diversication at the top.Adaptive form of Interleaved-DFS. Manuel Loth Constraint Programming and Monte-Carlo Tree Search 11 January 2013 18 / 30 39. Constraint Programming Monte-Carlo Tree Search AdaptationExperimentsBottom policyA top node is opened every 5 trial. Manuel Loth Constraint Programming and Monte-Carlo Tree Search 11 January 2013 19 / 30 40. Constraint Programming Monte-Carlo Tree Search AdaptationExperimentsTop policyTwo sources of attraction: depth and left;most simple, elegant and efcient combination:differentiated exploration constants in UCB log t It = arg maxi + C ini Manuel Loth Constraint Programming and Monte-Carlo Tree Search 11 January 2013 20 / 30 41. Constraint Programming Monte-Carlo Tree Search AdaptationExperimentsTop policyTwo sources of attraction: depth and left;most simple, elegant and efcient combination:differentiated exploration constants in UCB log tIt = arg max i + C i ini Manuel Loth Constraint Programming and Monte-Carlo Tree Search 11 January 2013 20 / 30 42. Constraint Programming Monte-Carlo Tree Search AdaptationExperimentsStatistics storing and sharingStatistics are processed for each variable (action),rather than for each node, becausefrequent restarts: all nodes are thrown away, few trials;RAVE: MCTS boosting by sharing statistics of similaractions in different nodes, which is especially relevanthere: same sequence of actions in different order leads to the same state, reward is an indirect indicator, for biasing the search order, no real need to converge.One bandit per variable, called each time the variableis addressed. Manuel Loth Constraint Programming and Monte-Carlo Tree Search 11 January 2013 21 / 30 43. Constraint Programming Monte-Carlo Tree Search AdaptationExperiments1 Constraint Programming2 Monte-Carlo Tree Search3 Adaptation4 Experiments Manuel Loth Constraint Programming and Monte-Carlo Tree Search 11 January 2013 22 / 30 44. Constraint Programming Monte-Carlo Tree Search AdaptationExperimentsJob Shop Problem 23 23 minimize makespan 231212 12 12 21 21 21231111 11 + 21= 22 22 22 2213 1313 13 11J1 J2M1M2M3J1 J2M1 M2 M3per job orderper machine order schedule (problem) (solution) Manuel Loth Constraint Programming and Monte-Carlo Tree Search11 January 201323 / 30 45. Constraint Programming Monte-Carlo Tree Search AdaptationExperimentsCompare different top policiesNone: single DFS (baseline);Balanced: alternate left and right, non-adaptative, no leftbias; -left: P(left) = 1 (stochastic emulation of LDS),non-adaptative, left bias;UCB: adaptive, no left bias;UCB-left: adaptive, left bias; Manuel Loth Constraint Programming and Monte-Carlo Tree Search 11 January 2013 24 / 30 46. Constraint Programming Monte-Carlo Tree Search AdaptationExperimentsBalanced Manuel Loth Constraint Programming and Monte-Carlo Tree Search 11 January 2013 25 / 30 47. Constraint Programming Monte-Carlo Tree Search AdaptationExperiments -left Manuel Loth Constraint Programming and Monte-Carlo Tree Search 11 January 2013 26 / 30 48. Constraint Programming Monte-Carlo Tree Search AdaptationExperimentsUCB Manuel Loth Constraint Programming and Monte-Carlo Tree Search 11 January 2013 27 / 30 49. Constraint Programming Monte-Carlo Tree Search AdaptationExperimentsUCB-left Manuel Loth Constraint Programming and Monte-Carlo Tree Search 11 January 2013 28 / 30 50. Constraint ProgrammingMonte-Carlo Tree Search AdaptationExperimentsMRE on Taillard 11-200.1DFS Balanced e-leftUCBUCB-left mean relative error 0.0102000040000 60000 80000 100000 iterations Manuel LothConstraint Programming and Monte-Carlo Tree Search 11 January 2013 29 / 30 51. Constraint Programming Monte-Carlo Tree Search AdaptationExperimentsSummary and developmentsBandit-Search based on failure depth brought signicantimprovements over DFS,reached CPstate-of-the-art performance without ad-hoccomponents, improving a 2007-best-known result.Node statistics should also be used, for the last phase;extensive experiments on different problems are inprogress;integration in Gecode release. Manuel Loth Constraint Programming and Monte-Carlo Tree Search 11 January 2013 30 / 30 52. Constraint Programming Monte-Carlo Tree Search AdaptationExperimentsSummary and developmentsBandit-Search based on failure depth brought signicantimprovements over DFS,reached CPstate-of-the-art performance without ad-hoccomponents, improving a 2007-best-known result.Node statistics should also be used, for the last phase;extensive experiments on different problems are inprogress;integration in Gecode release. Thank you for your attention! Manuel Loth Constraint Programming and Monte-Carlo Tree Search 11 January 2013 30 / 30

Constraint Programming and Monte-Carlo Tree-Search: Application to the Job Shop Problem

Documents

Transcript of Constraint Programming and Monte-Carlo Tree-Search: Application to the Job Shop Problem