Computational Intro: Conservation and Biodiversity Wildlife Corridor Design Topics in Computational...
-
Upload
milton-jacob-eaton -
Category
Documents
-
view
219 -
download
0
Transcript of Computational Intro: Conservation and Biodiversity Wildlife Corridor Design Topics in Computational...
Computational Intro:Conservation and Biodiversity
Wildlife Corridor Design
Topics in Computational SustainabilitySpring 2010
Joint work with Jon Conrad, Bistra Dilkina, Willem van Hoeve, Ashish Sabharwal, and Jordan Sutter
Carla P. Gomes
Outline
Wildlife corridor design problem– Problem Definition
How hard is it to solve it? – Concepts of Problem Complexity
How to model it?– Mixed Integer Programming formulation and other issues
How to solve it?– How to scale up solutions?
Experimental Results Research Questions
2
Conservation and Biodiversity :Wildlife Corridors
New York Times (Science) 2006
Wildlife CorridorsPreserve wildlife against
land fragmentation
Link core biological areas, allowing animal movement
between areas.
Limited budget; must maximize environmental benefits/utility
Conservation and Biodiversity :Grizzly Bear Wildlife Corridors
Wildlife Corridors link core biological areas, allowing animal movement between areas.
Typically: low budgets to implement corridors.
Example:
Goal: preserve grizzly bear populations in the U.S. Northern Rockies by creating
wildlife corridors connecting 3 reserves:
Yellowstone National Park; Glacier Park and Salmon-Selway Ecosystem
Real world instance:
Corridor for grizzly bears in the Northern Rockies, connecting:
YellowstoneSalmon-Selway EcosystemGlacier Park
Grizzly Bear Corridor inNorthern Rockies
Cost
Habitat Suitability
can be a challenging Machine Learning problem
Study area ~ 320,000 sq km
Wildlife Corridor Design:Problem Definition
(Informal English Definition )
Instance:– A set of parcels and their neighborhood relationships– A set of reserves or terminals (subset of the parcels)– The cost and the utility (habitat suitability) per parcel
Question:– What is the set of connected parcels, containing the reserves, maximizing the utility, such that the total cost does not exceed a given budget C?
Reserve
Land parcelCost and utility info omitted
Example
9
cost
utility
Budget 10 Budget 11
Cost = 10;Utility = 9 Cost = 11;Utility = 10
Min Cost solution
Cost = 7;Utility = 5
Wildlife Corridor Design: (Graph Representation)
Input:– A set of parcels and their neighborhood relationship– A set of reserves or terminals (subset of the parcels)– The cost and the utility (habitat suitability) per parcel
Output:– A set of connected parcels, containing the reserves maximizing the utility, such that the total cost does not
exceed a given budget C
10
Reserve
Land parcel
Undirected Graph Representation
G=(V,E)
Cost and utility info omitted in the pictures
The Connection Subgraph Problem(Optimization Version)
Instance– An undirected graph G = (V,E)– Terminal vertices T V– Vertex cost function: c(v); utility function: u(v)– Cost bound / budget C;
Question
What’s the subgraph H of G with
maximum utility such that– H is connected and contains T– cost(H) C?
Utility optimization version : given C, maximize utility
11
11
Cost optimization version : given U, minimize cost
The Connection Subgraph Problem(Decision Version)
Instance – An undirected graph G = (V,E)– Terminal vertices T V– Vertex cost function: c(v); utility function: u(v)– Cost bound / budget C; desired utility U
Question
Is there a subgraph H of G such that– H is connected and contains T– cost(H) C; utility(H) U ?
12
12
13
Connection Subgraph: other possible applications
Social networks What characterizes the connection between two individuals?
The shortest path? Size of the connected component?A “good” connected subgraph?
If a person is infected with a disease, who else is likely to be? Which people have unexpected ties to any members of a list of
other individuals?
Vertices in graph: people; edges: know each other or not
[Faloutsos, McCurley, Tompkins ’04]
Project: Find other applications of the connection graph problem and variants and apply/extend ideas presented in this lecture.
How hard (complex) is it to solve the
connection sub-graph problem?
Before answering this question…
15
How do computer scientists differentiate between good (efficient) and bad (not efficient) algorithms
The yardstick is that any algorithm that runs in no more than polynomial time is an efficient algorithm;
everything else is not.
Efficient algorithms
Not efficient algorithms
Ordered functions by their growth rates
cOrder
constant 1
logarithmic 2
polylogarithmic 3
nr ,0<r<1
nsublinear 4
linear 5
nr ,1<r<2 subquadratic 6
quadratic 7
cubic 8
nc,c≥1
rn, r>1
polynomial 9
exponential 10
lg n
lgc n
n3
n2
C. P. Gomes
18
Roughly Speaking…Roughly Speaking…
Size of instanceN
Cost(run time)
exponentialquadratic
linear
logarithmic
constant
exponential
polynomial
N2
Binary B&B alg.
Polynomial vs. exponential growth (Harel 2000)
LP’s interior pointMin. Cost Flow AlgsTransportation AlgAssignment AlgDijkstra’s alg.
20
How can we show a problem is efficiently solvable?– We can show it constructively. We provide an algorithm and
show that it solves the problem efficiently. E.g.:
Shortest path problem - Dijkstra’s algorithm runs in polynomial time. Therefore the shortest path problem can be solved efficiently.
Linear Programming – The Interior Point method has polynomial worst-case complexity. Therefore Linear programming can be solved efficiently.
(*) The simplex method has exponential worst case complexity/ However, in practice the simplex algorithm seems to scale as m3, where m is the number of functional constraints.
21
How can we show a problem is not efficiently solvable?
– How do you prove a negative? Much harder!!!
– This is the aim of complexity theory.
22
Easy (efficiently solvable) problems vsHard Problems
Easy Problems - we consider a problem X to be “easy” or efficiently solvable, if there is a polynomial time algorithm A for solving X. We denote by P the class of problems solvable in polynomial time.
Hard problems --- everything else. Any problem for which there is no polynomial time algorithm is an intractable problem.
23
EXPONENTIAL FUNCTION
POLYNOMIAL FUNCTIONHard Computational
ProblemsScale Exponentially
In the worst case
EXPONENTIAL-TIMEALGORITHMS
EXPLOSIVECOMBINATORICS
ExperimentDesignGoal
Start
Software & HardwareVerification
Satisfiability
(A or B) (D or E or not A)
Data Analysis& Data Mining
Fiber optics routing
Capital BudgetingAnd Financial Appl. Information
Retrieval
Protein Folding
And Medical ApplicationsCombinatorial
Auctions
Planning and SchedulingAnd Supply Chain Management
Many more applications!!!
Tackling practical size instances
requires powerful computational and mathematical tools!
NP-Complete andNP-Hard Problems
The connection subgraph problem is NP-Hard.
Connections in networks: Hardness of feasibility versus optimality. Conrad, J., C. Gomes, W.-J. van Hoeve, A. Sabharwal, and J. Suter. Proc. CPAIOR 07, 2007 pages 16–28.
How hard (complex) is the connection subgraph problem?
Unfortunately that means we don’t know of good, efficient (polynomial time) algorithms to solve this problem.
We believe the connection subgraph problem is intractable:
Computer scientists only know of exponential time algorithms to solve it (and computer scientists strongly believe that no polynomial time algorithm will ever be found, but there is no
prove either way)
The connection subgraph problem is NP-Hard!
Worst Case Result!Real-world problems are not necessarily
worst case and they possess hidden sub-structure
that can be exploited allowing scaling up of solutions.
Connections in networks: Hardness of feasibility versus optimality. Conrad, J., C. Gomes, W.-J. van Hoeve, A. Sabharwal, and J. Suter. Proc. CPAIOR 07, 2007 pages 16–28.
Should we give up on finding good solutions?
Root (r)Max Flow = 9
Single commodity Flow Encoding
– Variables: xi , binary variable, for each vertex i ( 1 if included in corridor ; 0 otherwise)
Yij, continuous variable for each edge flow ij
– Cost constraint: i cixi C
– Utility optimization function: maximize i uixi
– Connectedness: use a single commodity flow encoding
11
51
1
3
1 2
1
6 1 1
Single Commodity Flow: MIP Max utility
Budget constraint
Reserves
Total flow
Flow balance
Incoming edges allowed only if selected
This is what makes the problem hard
≤
Note: E’ is the set of directed edges, obtained from replacing each undirected edge of E with two directed edges.
29
Solving the Mixed Integer Programming Encoding
Cplex – state of the art MIP solver
Branch and Bound LP relaxation Cut generation
connectionsubgraphinstance
MIPmodel
feasibility + optimization
CPLEXsolution
31
Synthetic Instances for Evaluation
Problem evaluated on semi-structured graphs
m x m lattice / grid graph with k terminals Inspired by the conservation corridors problem
Place a terminal each on top-left and bottom-right Maximizes grid use
Place remaining terminals randomly Assign uniform random costs and utilities
from {0, 1, …, 10}
m = 4 k = 4
32
Standard MIPResults: without terminals
No terminals “find the connected component that maximizes the utility within the given budget”
Pure optimization problem; always feasible Still NP-hard
Budget fraction
Run
time
(logs
cale
)
0 0.2 0.4 0.6 0.8
0.01
1
10
0
10
000
6 x 6
8 x 8
10 x 10
A clear easy-hard-easypattern with uniform
random costs & utilities
Note 1: plot in log-scale for betterviewing of the sharp transitions
Note 2: each data point is medianover 100+ random instances
33
Standard MIP:
3 terminals (feasibility vs. optimization)
Split instances into feasible and infeasible; plot median runtime For feasible ones : computation involves proving optimality For infeasible ones: computation involves proving infeasibility
Infeasible instances take much longer than the feasible ones!
May 23, 2008 Ashish Sabharwal CP-AI-OR '08 34
connectionsubgraphinstance
MIPmodel
feasibility + optimization
CPLEXsolution
Problem? MIP+Cplex really weak at
feasibility testing Poor scaling: couldn’t even get
close to handling real data
Can we do better?
Results: with terminals
A Related Problem (ignoring utilities):Minimum Cost solution -
The Steiner Tree Problem
Input – An undirected graph G = (V,E)– Terminal vertices T V– Edge cost function: c(e);
Question
What’s the subgraph H of G
with minimum cost such that – H is connected and contains T?
35
35
If the edge costs are all positive, then the resulting subgraph is obviously a tree.
The Steiner Tree Problem:Min cost tree connecting the terminals
Also NP-Hard but
When we only have two terminals shortest path(e.g., Dijkstra algorithm or algorithm based on dynamic
programming)
Bounded number of terminals Fixed parameter tractable algorithm
36
The Steiner Tree Problem:Min cost tree connecting the terminals
Three terminals (as in the case of our grizzly bear problem)
Algorithm ---in order to connect the three terminals - find where to place the root of the tree compute all pairs shortest paths (easy algorithm based on dynamic programming or even Dijkstra’s)
Algorithm also used for the starting point of a greedy solution – start with the minimum cost corridor and extend it greedily by picking the nodes with decreasing util/cost ratio to use the remaining budget
Algorithm also used for pruning (nodes that are too far away and connecting them to the terminals is beyond the budget can be pruned)
37
Solving the connection subgraph problem: Two Phase Approach
1st Phase – compute the minimum Steiner tree based algorithm and produces a greedy solution
This phase runs in polynomial time for a constant number of terminal nodes.
2nd Phase - Refines the greedy solution to produce an optimal solution with Cplex
38
Solving the connection subgraph problem: Phase !
1st Phase – compute the minimum Steiner tree based algorithm– Produces the minimum cost solution– Produces shortest path information used for pruning the serach
space - the all-pairs-shortest-paths matrix – Produces a greedy (and often sub-optimal) solution for feasible
instances (highest util/cost ratio parcels are selected to use the remaining budget)
This phase runs in polynomial time for a constant number of terminal nodes.
39
Solving the connection subgraph problem: Phase II
Refines the greedy solution to produce an optimal solution with Cplex– Greedy solution is passed to Cplex as the starting solution (Cplex
can change it).– The all-pairs-shortest-paths matrix computed in Phase I is also
passed on to Phase II. It is used to statically (i.e., at the beginning) prune away all nodes that are easily deduced to be too far to be part of a solution (e.g., if the minimum Steiner tree containing that node and all of the terminal vertices already exceeds the budget). This significantly reduces the search space size, often in the range of 40-60%.
Computes an optimal solution (or the optimal extended-mincost solution) to the utility-maximization version of the connection subgraph problem.
40
Solving the Connection Sub-Graph Problem:Exploiting Structure (A Hybrid MIP/CP Approach)
CPLEX
connectionsubgraphinstance
solution
MIPmodel
optimization feasibility
compute min-costSteiner tree
ignore utilities
greedily extendmin-cost solution
to fill budget
APSPmatrix0 3 6 2 83 0 7 4 16 7 0 5 92 4 5 0 18 1 9 1 0
min-cost solution
dynamicpruning
higher utilityfeasible solution
starting solution
40-60%pruned
“like” knapsack: max u/c
Conrad, G., van Hoeve, Sabharwal, Sutter 2008
42
10x10 random lattices, 3 reserves
~20x improvementin runtime on
feasible instances
Infeasible instancessolved instantaneously!
43
10x10 random lattices, 3 reserves
Peak of hardnessstill strongly
correlated withbudget slack
Gap between optimaland extended-optimal
solutions
Real world instance:
Corridor for grizzly bears in the Northern Rockies, connecting:
YellowstoneSalmon-Selway EcosystemGlacier Park
Grizzly Bear Corridor inNorthern Rockies
Cost
Habitat Suitability
can be a challenging Machine Learning problem
Study area ~ 320,000 sq km
47
Real Data, 50x50km Parcels
Gap between optimaland extended-optimal
solutions peaks in acritical region right
after min-cost
50x50km Parcels
48
Real Data, 40x40km Parcels
Gap between optimaland extended-optimal
solutions peaks in acritical region right
after min-cost
40x40km Parcels
Encodings
Encodings
– Complete Methods (proof of optimality) Other MIP formulations that scale better in practice? Other formulations that allow us to prove optimality faster? Other paradigms (e.g., constraint based, SAT modulo theories,
extensions of SAT solvers, Mixed logic programming)?
– Incomplete Methods (cannot prove optimality but may find good solutions)
Simulated annealing, genetic algorithms etc
– Hybrid complete/incomplete methods
54
Bistra Dilkina is interested in these issues 55
Approximation results
Cost optimization NP-hard to approximate within a factor of 1.36– Utility version?
Related Work Moss & Rabani 2001/2007
– Node-Weighted Steiner Tree – costs and utilities on nodes– Approximation results
Costa et al 2006/2008/2009– Steiner Tree with Budget, Revenues and Hop Constraints– Costs and utilities on edges– Directed Steiner Tree encoding and Branch-and-Cut
Models Are Important!!!
Single Commodity Flow
Directed Steiner Tree
Captures Better the Connectedness Structure !
Exponential Number of Constraints !
Provides good upper bounds!
Quite compact (poly size)
Conrad, Dilkina, Gomes, van Hoeve, Sabharwal, Sutter 2007, 2008, 2009
A broad class of applications for projects
A family of problems - spatially targeted interventions Conservation and Biodiversity
Site Selection, Reserve Network Design, Wildlife Corridors Social Welfare
Portfolios of Asset-based poverty interventions
Bistra Dilkina 2009
Spatially targeted interventions
Select a subset A of spatially-explicit actions U– Maximize a sustainability function F– Such that cost of actions does not exceed limited budget B
max F(A) s.t. C(A) <= B
Complexity added by:– Spatial constraints (connectivity, distance, etc)– Data Uncertainty – Dynamics: Meta-population models, Climate change
Bistra Dilkina 2009
Additional Levels of Complexity: Stochasticity, Uncertainty, Large-Scale Data Modeling
60
• Multiple species (hundreds or thousands), with interactions (e.g. predator/prey).
• Biological and ecological issues (for a species and within-species )
• Movements and migrations;
• Climate change
• Other factors(e.g., different models of land conservation (e.g., purchase, conservation easements, auctions) typically over different time periods).
What different objective functions can weconsider for preserving species - biodiversity?
• How to estimate population distributions and habitat suitability? Where and how to collect data?
Eastern Phoebe Migration
Bagged Decision TreesDaniel Fink,Wesley Hochachka, Art Munson, Mirek Riedewald,
Ben Shaby, Giles Hooker, and Steve Kelling, 2009.
Steven Philips, Miro Dudik & Rob Schapire
Maxent
Source: Daniel Fink.
Information Sciences
Summary
Wildlife corridor problem
problem formulation
computational complexity issues
models and solution approaches Research questions
Our approaches clearly outperform approaches reported in the literature!
61
63
Theoretical Results: 1
NP-completeness: reduction from the Steiner Tree problem, preserving the cost function. Idea:– Steiner tree problem already very similar– Simulate edge costs with node costs– Simulate terminal vertices with utility function
NP-complete even without any terminals– Recall: Steiner tree problem poly-time solvable with constant
number of terminals
Also holds for planar graphs
64
v1 vn
v2
v3
…
…
Theoretical Results: 2
NP-hardness of approximating cost optimization (factor 1.36): reduction from the Vertex Cover problem
Reduction motivated by Steiner tree work [Bern, Plassmann ’89]
vertex cover of size k iff connection subgraph with cost bound C = k and utility U = m